Skip to main content
POST
/
api
/
webparser
/
parse
/webparser/parse
curl --request POST \
  --url https://api.anysite.io/api/webparser/parse \
  --header 'Content-Type: application/json' \
  --header 'access-token: <access-token>' \
  --data '{
  "timeout": 300,
  "url": "https://www.example.com",
  "include_tags": [
    "article",
    ".content",
    "#main-content"
  ],
  "exclude_tags": [
    ".sidebar",
    ".advertisement",
    "*promo*"
  ],
  "only_main_content": false,
  "remove_comments": true,
  "resolve_srcset": true,
  "return_full_html": false,
  "min_text_block": 200,
  "remove_base64_images": true,
  "strip_all_tags": false,
  "extract_contacts": false,
  "same_origin_links": false,
  "social_links_only": false
}'
[
  {
    "@type": "WebParserResult",
    "cleaned_html": "<string>",
    "url": "<string>",
    "title": "<string>",
    "meta_description": "<string>",
    "metadata": {},
    "links": [
      "<string>"
    ],
    "emails": [
      "<string>"
    ],
    "phones": [
      "<string>"
    ]
  }
]

Headers

access-token
string
required

Body

application/json
url
string<uri>
required

URL of the page to parse

Required string length: 1 - 2083
Examples:

"https://www.example.com"

"https://blog.example.com/article"

timeout
integer
default:300

Max scrapping execution timeout (in seconds)

Required range: 20 <= x <= 1500
include_tags
string[] | null

CSS selectors of elements to include (keep only these)

Examples:
["article", ".content", "#main-content"]
exclude_tags
string[] | null

CSS selectors or wildcard masks of elements to exclude. Examples: '.sidebar', '.advertisement', 'promo', 'banner'

Examples:
[".sidebar", ".advertisement", "*promo*"]
only_main_content
boolean
default:false

Extract only main content of the page (heuristic algorithm)

remove_comments
boolean
default:true

Remove HTML comments

resolve_srcset
boolean
default:true

Convert image srcset to src (selects the largest image)

return_full_html
boolean
default:false

Return full HTML document (True) or only body content (False)

min_text_block
integer
default:200

Minimum text block size for main content detection (in characters)

Required range: x >= 0
remove_base64_images
boolean
default:true

Remove base64-encoded images (reduces output size)

strip_all_tags
boolean
default:false

Remove all HTML tags and return plain text only

extract_contacts
boolean
default:false

Extract links, emails, and phone numbers from the page

Only extract links from the same domain (used with extract_contacts)

Only extract social media links (LinkedIn, Twitter/X, Facebook, Instagram, etc.)

Response

Successful Response

cleaned_html
string
required

Cleaned HTML

url
string
required

URL of the original page

@type
string
default:WebParserResult
title
string | null

Page title (from <title> tag)

meta_description
string | null

Meta description

metadata
object

Additional metadata

Extracted URLs from the page

emails
string[] | null

Extracted email addresses

phones
string[] | null

Extracted phone numbers

I