Headers
Body
URL of the page to parse
1 - 2083
"https://www.example.com"
"https://blog.example.com/article"
Max scrapping execution timeout (in seconds)
20 <= x <= 1500
CSS selectors of elements to include (keep only these)
["article", ".content", "#main-content"]
CSS selectors or wildcard masks of elements to exclude. Examples: '.sidebar', '.advertisement', 'promo', 'banner'
[".sidebar", ".advertisement", "*promo*"]
Extract only main content of the page (heuristic algorithm)
Remove HTML comments
Convert image srcset to src (selects the largest image)
Return full HTML document (True) or only body content (False)
Minimum text block size for main content detection (in characters)
x >= 0
Remove base64-encoded images (reduces output size)
Remove all HTML tags and return plain text only
Extract links, emails, and phone numbers from the page
Only extract links from the same domain (used with extract_contacts)
Only extract social media links (LinkedIn, Twitter/X, Facebook, Instagram, etc.)
Response
Successful Response
Cleaned HTML
URL of the original page
Page title (from <title> tag)
Meta description
Additional metadata
Extracted URLs from the page
Extracted email addresses
Extracted phone numbers