Skip to main content
POST
/
api
/
webparser
/
sitemap
/webparser/sitemap
curl --request POST \
  --url https://api.anysite.io/api/webparser/sitemap \
  --header 'Content-Type: application/json' \
  --header 'access-token: <api-key>' \
  --data '
{
  "url": "<string>",
  "timeout": 300,
  "include_patterns": [
    "^/blog/",
    "^/docs/"
  ],
  "exclude_patterns": [
    "\\.pdf$",
    "/admin/"
  ],
  "same_host_only": true,
  "respect_robots": true,
  "count": 2,
  "return_details": false
}
'
[
  {
    "urls": [
      "<string>"
    ],
    "total_found": 123,
    "sitemap_locations": [
      "<string>"
    ],
    "@type": "SitemapResult",
    "entries": [
      {
        "loc": "<string>",
        "@type": "SitemapEntry",
        "lastmod": "<string>",
        "changefreq": "<string>",
        "priority": 0.5
      }
    ]
  }
]

Authorizations

access-token
string
header
required

Headers

access-token
string
required

Body

application/json
url
string<uri>
required

Website URL to fetch sitemap from

Required string length: 1 - 2083
timeout
integer
default:300

Max scrapping execution timeout (in seconds)

Required range: 20 <= x <= 1500
include_patterns
string[] | null

Regex patterns for URL paths to include (all URLs if not specified)

Example:
["^/blog/", "^/docs/"]
exclude_patterns
string[] | null

Regex patterns for URL paths to exclude

Example:
["\\.pdf$", "/admin/"]
same_host_only
boolean
default:true

Only include URLs from the same host as the base URL

respect_robots
boolean
default:true

Check robots.txt and respect disallowed URLs

count
integer | null

Maximum number of URLs to return

Required range: x >= 1
return_details
boolean
default:false

Return detailed sitemap entries (lastmod, changefreq, priority) instead of just URLs

Response

Successful Response

urls
string[]
required

List of URLs found in sitemap(s)

total_found
integer
required

Total number of URLs found

sitemap_locations
string[]
required

Sitemap URLs that were processed

@type
string
default:SitemapResult
entries
SitemapEntry · object[] | null

Detailed sitemap entries (if return_details=True)