/webparser/parse

curl --request POST \
  --url https://api.anysite.io/api/webparser/parse \
  --header 'Content-Type: application/json' \
  --header 'access-token: <api-key>' \
  --data '
{
  "url": "<string>",
  "timeout": 300,
  "include_tags": [
    "article",
    ".content",
    "#main-content"
  ],
  "exclude_tags": [
    ".sidebar",
    ".advertisement",
    "*promo*"
  ],
  "only_main_content": false,
  "remove_comments": true,
  "resolve_srcset": true,
  "return_full_html": false,
  "min_text_block": 200,
  "remove_base64_images": true,
  "strip_all_tags": false,
  "extract_contacts": false,
  "same_origin_links": false,
  "social_links_only": false,
  "extract_minimal": false
}
'

[
  {
    "cleaned_html": "<string>",
    "url": "<string>",
    "@type": "@web_parser_result",
    "title": "<string>",
    "meta_description": "<string>",
    "metadata": {},
    "links": [
      "<string>"
    ],
    "emails": [
      "<string>"
    ],
    "phones": [
      "<string>"
    ]
  }
]

POST

api

webparser

parse

/webparser/parse

curl --request POST \
  --url https://api.anysite.io/api/webparser/parse \
  --header 'Content-Type: application/json' \
  --header 'access-token: <api-key>' \
  --data '
{
  "url": "<string>",
  "timeout": 300,
  "include_tags": [
    "article",
    ".content",
    "#main-content"
  ],
  "exclude_tags": [
    ".sidebar",
    ".advertisement",
    "*promo*"
  ],
  "only_main_content": false,
  "remove_comments": true,
  "resolve_srcset": true,
  "return_full_html": false,
  "min_text_block": 200,
  "remove_base64_images": true,
  "strip_all_tags": false,
  "extract_contacts": false,
  "same_origin_links": false,
  "social_links_only": false,
  "extract_minimal": false
}
'

[
  {
    "cleaned_html": "<string>",
    "url": "<string>",
    "@type": "@web_parser_result",
    "title": "<string>",
    "meta_description": "<string>",
    "metadata": {},
    "links": [
      "<string>"
    ],
    "emails": [
      "<string>"
    ],
    "phones": [
      "<string>"
    ]
  }
]

Authorizations

access-token

string

header

required

Headers

access-token

string

required

Body

application/json

url

string<uri>

required

URL of the page to parse

Required string length: 10 - 2083

Examples:

"https://www.example.com"

"https://blog.example.com/article"

timeout

integer

default:300

Max scrapping execution timeout (in seconds)

Required range: 20 <= x <= 1500

include_tags

string[] | null

CSS selectors of elements to include (keep only these)

Example:

["article", ".content", "#main-content"]

exclude_tags

string[] | null

CSS selectors or wildcard masks of elements to exclude. Examples: '.sidebar', '.advertisement', 'promo', 'banner'

Example:

[".sidebar", ".advertisement", "*promo*"]

only_main_content

boolean

default:false

Extract only main content of the page (heuristic algorithm)

remove_comments

boolean

default:true

Remove HTML comments

resolve_srcset

boolean

default:true

Convert image srcset to src (selects the largest image)

return_full_html

boolean

default:false

Return full HTML document (True) or only body content (False)

min_text_block

integer

default:200

Minimum text block size for main content detection (in characters)

Required range: x >= 0

remove_base64_images

boolean

default:true

Remove base64-encoded images (reduces output size)

strip_all_tags

boolean

default:false

Remove all HTML tags and return plain text only

extract_contacts

boolean

default:false

Extract links, emails, and phone numbers from the page

same_origin_links

boolean

default:false

Only extract links from the same domain (used with extract_contacts)

Only extract social media links (LinkedIn, Twitter/X, Facebook, Instagram, etc.)

extract_minimal

boolean

default:false

Use minimal extraction (only links, title, emails, phones if set)

Response

Successful Response

cleaned_html

string

required

Cleaned HTML

url

string

required

URL of the original page

@type

string

default:@web_parser_result

title

string | null

Page title (from

meta_description

string | null

Meta description

metadata

Metadata · object

Additional metadata

links

string[] | null

Extracted URLs from the page

emails

string[] | null

Extracted email addresses

phones

string[] | null

Extracted phone numbers

/twitter/user/posts /webparser/sitemap

Token Management

Custom AI Parsers

Crunchbase

DuckDuckGo Search

Instagram Posts

Instagram Search

Instagram Users

LinkedIn Companies

LinkedIn Email Finder

LinkedIn Web Search

LinkedIn Groups

LinkedIn Management

LinkedIn Posts

LinkedIn Search

LinkedIn Social Search

LinkedIn Users

LinkedIn DB

Reddit Posts

Reddit Search

Reddit Users

SEC

Siemens

Twitter Search

Twitter Users

Webparser

Y Combinator

YouTube

/webparser/parse

Authorizations

Headers

Body

Response