Overview
The AnySite Web Parser node provides powerful web scraping capabilities within your n8n workflows. Extract data from any website, parse HTML content, and convert unstructured web data into structured information for analysis and automation.
Node Configuration
Authentication
credential
AnySite API Credentials
required
Select your AnySite API credentials from the dropdown or create new ones.
Available Operations
Parse URL
Bulk URL Parse
Monitor Changes
Extract data from a specific web page URL. Parameters:
URL (required): Web page URL to scrape
Wait For Load : Wait time for dynamic content (0-30 seconds)
Extract Images : Include image URLs in the output
Extract Links : Include all links found on the page
Custom Selectors : CSS selectors for specific elements
Example Output: {
"page" : {
"url" : "https://example.com/article" ,
"title" : "How to Build Scalable Web Applications" ,
"description" : "A comprehensive guide to building web applications..." ,
"author" : "John Developer" ,
"publishDate" : "2024-08-26" ,
"content" : "Building scalable web applications requires..." ,
"images" : [
"https://example.com/images/architecture.png" ,
"https://example.com/images/diagram.jpg"
],
"links" : [
{
"text" : "Related Article" ,
"url" : "https://example.com/related"
}
],
"metadata" : {
"wordCount" : 1250 ,
"readingTime" : "5 minutes" ,
"tags" : [ "web development" , "scalability" , "architecture" ]
}
}
}
Parse multiple URLs in a single request. Parameters:
URLs (required): Array of URLs to parse
Batch Size : Number of URLs to process simultaneously
Fail on Error : Stop processing if one URL fails
Include Screenshots : Capture page screenshots
Example Output: {
"results" : [
{
"url" : "https://site1.com" ,
"status" : "success" ,
"title" : "Site 1 Title" ,
"content" : "Page content..." ,
"loadTime" : 1200
},
{
"url" : "https://site2.com" ,
"status" : "error" ,
"error" : "Page not found" ,
"loadTime" : 800
}
],
"summary" : {
"total" : 2 ,
"successful" : 1 ,
"failed" : 1 ,
"avgLoadTime" : 1000
}
}
Monitor web pages for changes and get notifications. Parameters:
URL (required): Web page to monitor
Check Interval : How often to check for changes (minutes)
Change Threshold : Minimum percentage change to trigger alert
Monitor Elements : Specific CSS selectors to monitor
Notification Method : “webhook”, “email”, or “return_data”
Example Output: {
"monitoring" : {
"url" : "https://competitor.com/pricing" ,
"lastChecked" : "2024-08-26T15:30:00Z" ,
"changes" : [
{
"element" : "#pricing-table" ,
"changeType" : "content" ,
"oldValue" : "$99/month" ,
"newValue" : "$89/month" ,
"changePercent" : 11.1 ,
"timestamp" : "2024-08-26T15:30:00Z"
}
],
"screenshot" : {
"before" : "https://cdn.hdw.ai/screenshots/before_123.png" ,
"after" : "https://cdn.hdw.ai/screenshots/after_123.png"
}
}
}
Workflow Examples
Competitor Price Monitoring
Monitor Competitor Pages
Set up monitoring for competitor pricing pages and product announcements.
Detect Changes
Get automatic notifications when competitors change prices or launch new products.
Analysis & Alerts
Analyze pricing changes and send alerts to your team with actionable insights.
Strategy Updates
Use the data to adjust your own pricing strategy and competitive positioning.
Example Workflow:
{
"nodes" : [
{
"name" : "Monitor Competitor Pricing" ,
"type" : "@horizondatawave/n8n-nodes-anysite.WebParser" ,
"operation" : "monitorChanges" ,
"parameters" : {
"url" : "https://competitor.com/pricing" ,
"checkInterval" : 60 ,
"changeThreshold" : 5 ,
"monitorElements" : [ "#pricing-table" , ".product-price" ]
}
},
{
"name" : "Filter Significant Changes" ,
"type" : "n8n-nodes-base.filter" ,
"parameters" : {
"conditions" : [
{
"field" : "changes[0].changePercent" ,
"operation" : "greaterThan" ,
"value" : 10
}
]
}
},
{
"name" : "Analyze Price Change" ,
"type" : "n8n-nodes-base.function" ,
"parameters" : {
"functionCode" : `
const change = items [ 0 ] .json.changes [ 0 ] ;
const analysis = {
competitor : "Competitor Inc" ,
product : "Enterprise Plan" ,
oldPrice : change.oldValue ,
newPrice : change.newValue ,
changeAmount : change.newValue - change.oldValue ,
changePercent : change.changePercent ,
recommendation : change.changePercent > 0 ?
"Consider promotional pricing" :
"Review our pricing strategy"
} ;
return [{ json : analysis }] ;
`
}
},
{
"name" : "Alert Team" ,
"type" : "n8n-nodes-base.slack" ,
"parameters" : {
"channel" : "#competitive-intel" ,
"text" : "🚨 Competitor Price Change Alert \\ n📊 {{ $json.competitor }} changed {{ $json.product }} from {{ $json.oldPrice }} to {{ $json.newPrice }} ({{ $json.changePercent }}%) \\ n💡 Recommendation: {{ $json.recommendation }}"
}
}
]
}
Content Research & Analysis
Automatically research and analyze content from multiple sources:
Industry News Monitoring - Track news sites for industry developments
Competitor Content Analysis - Monitor competitor blogs and announcements
Trend Research - Extract trending topics from various publications
Content Gap Analysis - Find content opportunities in your niche
SEO Research - Analyze top-ranking pages for target keywords
Lead Generation from Websites
Extract leads and contact information from business websites:
Directory Scraping - Extract business listings from directories
Contact Page Parsing - Get contact information from company websites
Team Page Analysis - Extract employee information and roles
Technology Detection - Identify technologies used by target companies
CRM Integration - Automatically add qualified leads to your CRM
Advanced Parsing
Custom CSS Selectors
Extract specific elements using CSS selectors:
{
"name" : "Custom Data Extraction" ,
"type" : "@horizondatawave/n8n-nodes-anysite.WebParser" ,
"operation" : "parseUrl" ,
"parameters" : {
"url" : "https://news.ycombinator.com" ,
"customSelectors" : {
"headlines" : ".titleline > a" ,
"scores" : ".score" ,
"comments" : ".subtext a[href*='item']:last-child" ,
"authors" : ".hnuser"
}
}
}
Dynamic Content Handling
Handle JavaScript-heavy websites:
{
"name" : "Parse SPA Website" ,
"type" : "@horizondatawave/n8n-nodes-anysite.WebParser" ,
"operation" : "parseUrl" ,
"parameters" : {
"url" : "https://spa-website.com" ,
"waitForLoad" : 10 ,
"waitForSelector" : "#dynamic-content" ,
"executeJavaScript" : "document.querySelector('#load-more').click()"
}
}
Transform extracted data into structured format:
// Clean and structure scraped data
{
"name" : "Transform Data" ,
"type" : "n8n-nodes-base.function" ,
"parameters" : {
"functionCode" : `
const cleanText = (text) => text?.trim().replace(/ \\ s+/g, ' ');
const extractPrice = (text) => {
const match = text.match(/ \\ $([ \\ d,]+(?: \\ . \\ d{2})?)/);
return match ? parseFloat(match[1].replace(',', '')) : null;
};
const transformed = items.map(item => ({
json: {
title: cleanText(item.json.title),
price: extractPrice(item.json.priceText),
description: cleanText(item.json.description),
url: item.json.url,
extractedAt: new Date().toISOString()
}
}));
return transformed;
`
}
}
Error Handling
Common Issues
Error: 408 - Page load timeoutSolution:
Increase wait time for slow-loading pages
Check if the website is experiencing issues
Consider parsing the page in multiple steps
Error: 403 - ForbiddenSolution:
Website may be blocking automated access
Try using different user agents
Respect robots.txt and terms of service
Consider reaching out to site owners
Error: 429 - Too many requestsSolution:
Add delays between requests
Reduce concurrent parsing operations
Implement exponential backoff
Consider upgrading your API plan
Error: 404 - Element not foundSolution:
Website structure may have changed
Update CSS selectors
Add fallback selectors
Implement graceful degradation
Robust Parsing
{
"name" : "Robust Web Parser" ,
"type" : "@horizondatawave/n8n-nodes-anysite.WebParser" ,
"continueOnFail" : true ,
"retryOnFail" : true ,
"maxTries" : 3 ,
"waitBetweenTries" : 5000 ,
"parameters" : {
"operation" : "parseUrl" ,
"url" : "{{ $json.targetUrl }}" ,
"fallbackSelectors" : {
"title" : [ "h1" , ".title" , ".headline" , "title" ],
"content" : [ ".content" , ".article-body" , "main" , ".post" ]
}
}
}
Data Quality & Validation
Content Validation
Validate extracted data quality:
// Data quality checks
{
"name" : "Validate Data Quality" ,
"type" : "n8n-nodes-base.function" ,
"parameters" : {
"functionCode" : `
const validateData = (data) => {
const quality = {
score: 0,
issues: [],
valid: true
};
// Check title
if (!data.title || data.title.length < 10) {
quality.issues.push('Title too short or missing');
quality.valid = false;
} else {
quality.score += 25;
}
// Check content
if (!data.content || data.content.length < 100) {
quality.issues.push('Content too short or missing');
quality.valid = false;
} else {
quality.score += 25;
}
// Check for duplicate content
if (data.title === data.description) {
quality.issues.push('Title and description are identical');
quality.score -= 10;
}
// Check for extraction artifacts
if (data.content.includes('javascript:') || data.content.includes('void(0)')) {
quality.issues.push('Content contains JavaScript artifacts');
quality.score -= 15;
}
quality.score = Math.max(0, quality.score);
return { ...data, quality };
};
return items.map(item => ({ json: validateData(item.json) }));
`
}
}
Duplicate Detection
Remove duplicate content:
{
"name" : "Remove Duplicates" ,
"type" : "n8n-nodes-base.removeDuplicates" ,
"parameters" : {
"compare" : "selectedFields" ,
"fieldsToCompare" : [ "title" , "url" ]
}
}
Integration Examples
Database Storage
Store parsed data in database:
{
"name" : "Store Parsed Data" ,
"type" : "n8n-nodes-base.postgres" ,
"parameters" : {
"operation" : "insert" ,
"table" : "scraped_content" ,
"columns" : [
"url" ,
"title" ,
"content" ,
"author" ,
"publish_date" ,
"scraped_at"
],
"values" : [
"={{ $json.url }}" ,
"={{ $json.title }}" ,
"={{ $json.content }}" ,
"={{ $json.author }}" ,
"={{ $json.publishDate }}" ,
"={{ new Date().toISOString() }}"
]
}
}
Content Management
Add to CMS or knowledge base:
{
"name" : "Add to Notion" ,
"type" : "n8n-nodes-base.notion" ,
"parameters" : {
"operation" : "create" ,
"resource" : "page" ,
"databaseId" : "your-database-id" ,
"properties" : {
"Title" : "={{ $json.title }}" ,
"URL" : "={{ $json.url }}" ,
"Content" : "={{ $json.content }}" ,
"Source" : "Web Scraping" ,
"Date" : "={{ new Date().toISOString() }}"
}
}
}
AI Analysis
Analyze extracted content with AI:
{
"name" : "AI Content Analysis" ,
"type" : "n8n-nodes-base.openAi" ,
"parameters" : {
"operation" : "analyze" ,
"prompt" : "Analyze this article and provide: 1) Main topics, 2) Key insights, 3) Sentiment, 4) Target audience. Article: {{ $json.title }} - {{ $json.content }}"
}
}
Parallel Processing
Process multiple URLs simultaneously:
{
"name" : "Parallel URL Processing" ,
"type" : "@horizondatawave/n8n-nodes-anysite.WebParser" ,
"operation" : "bulkUrlParse" ,
"parameters" : {
"urls" : [
"https://site1.com" ,
"https://site2.com" ,
"https://site3.com"
],
"batchSize" : 3 ,
"maxRetries" : 2
}
}
Selective Parsing
Only parse essential elements to improve speed:
{
"name" : "Fast Essential Parsing" ,
"type" : "@horizondatawave/n8n-nodes-anysite.WebParser" ,
"operation" : "parseUrl" ,
"parameters" : {
"url" : "{{ $json.url }}" ,
"extractImages" : false ,
"extractLinks" : false ,
"customSelectors" : {
"title" : "h1" ,
"price" : ".price" ,
"availability" : ".stock-status"
}
}
}
Best Practices
Ethical Scraping
Always respect robots.txt files
Don’t overload servers with too many requests
Follow website terms of service
Consider reaching out to site owners for API access
Store only necessary data and respect privacy
Use batch operations for multiple URLs
Implement proper error handling and retries
Add appropriate delays between requests
Cache frequently accessed data
Monitor your API usage and quotas
Data Quality
Validate extracted data before using it
Implement fallback extraction methods
Clean and normalize text content
Remove duplicate entries
Handle encoding and special characters properly
Next Steps