Documentation Index
Fetch the complete documentation index at: https://docs.anysite.io/llms.txt
Use this file to discover all available pages before exploring further.
Overview
After collecting data with dataset pipelines, you can query it using SQL powered by DuckDB. This works directly on Parquet files — no separate database needed.Requires the
data extra: pip install "anysite-cli[data]"SQL Queries
Run SQL against your collected dataset:Query a Specific Source
Complex Queries
Interactive Mode
Launch an interactive SQL shell:Dataset Statistics
Get a summary of collected data:- Number of records collected
- Collection timestamp
- File size
- Column list with types
Source-Level Stats
Dataset Profiling
Generate a statistical profile of your data:- Column-level statistics (min, max, mean, median, null count)
- Value distributions for categorical columns
- Data quality indicators
Output Formats
Query results support the same output formats as API calls:Commands Reference
| Command | Description |
|---|---|
anysite dataset query <yaml> --sql "..." | Run SQL query on collected data |
anysite dataset query <yaml> --source <id> | Query a specific source |
anysite dataset query <yaml> --interactive | Open interactive SQL shell |
anysite dataset stats <yaml> | Show dataset statistics |
anysite dataset stats <yaml> --source <id> | Show stats for a specific source |
anysite dataset profile <yaml> | Generate data profile with distributions |
Next Steps
Examples
See complete end-to-end workflow examples
Database Loading
Load query results into a persistent database