Skip to main content
POST
/
api
/
webparser
/
sitemap
/webparser/sitemap
curl --request POST \
  --url https://api.anysite.io/api/webparser/sitemap \
  --header 'Content-Type: application/json' \
  --header 'access-token: <access-token>' \
  --data '{
  "timeout": 300,
  "url": "https://www.example.com",
  "include_patterns": [
    "^/blog/",
    "^/docs/"
  ],
  "exclude_patterns": [
    "\\.pdf$",
    "/admin/"
  ],
  "same_host_only": true,
  "respect_robots": true,
  "count": 2,
  "return_details": false
}'
[
  {
    "@type": "SitemapResult",
    "urls": [
      "<string>"
    ],
    "entries": [
      {
        "@type": "SitemapEntry",
        "loc": "<string>",
        "lastmod": "<string>",
        "changefreq": "<string>",
        "priority": 0.5
      }
    ],
    "total_found": 123,
    "sitemap_locations": [
      "<string>"
    ]
  }
]

Headers

access-token
string
required

Body

application/json
url
string<uri>
required

Website URL to fetch sitemap from

Required string length: 1 - 2083
Examples:

"https://www.example.com"

"https://blog.example.com"

timeout
integer
default:300

Max scrapping execution timeout (in seconds)

Required range: 20 <= x <= 1500
include_patterns
string[] | null

Regex patterns for URL paths to include (all URLs if not specified)

Examples:
["^/blog/", "^/docs/"]
exclude_patterns
string[] | null

Regex patterns for URL paths to exclude

Examples:
["\\.pdf$", "/admin/"]
same_host_only
boolean
default:true

Only include URLs from the same host as the base URL

respect_robots
boolean
default:true

Check robots.txt and respect disallowed URLs

count
integer | null

Maximum number of URLs to return

Required range: x >= 1
return_details
boolean
default:false

Return detailed sitemap entries (lastmod, changefreq, priority) instead of just URLs

Response

Successful Response

urls
string[]
required

List of URLs found in sitemap(s)

total_found
integer
required

Total number of URLs found

sitemap_locations
string[]
required

Sitemap URLs that were processed

@type
string
default:SitemapResult
entries
SitemapEntry · object[] | null

Detailed sitemap entries (if return_details=True)

I