Overview
After collecting data with dataset pipelines, you can query it using SQL powered by DuckDB. This works directly on Parquet files — no separate database needed.Requires the
data extra: pip install "anysite-cli[data]"SQL Queries
Run SQL against your collected dataset:Query a Specific Source
Complex Queries
Interactive Mode
Launch an interactive SQL shell:Dataset Statistics
Get a summary of collected data:- Number of records collected
- Collection timestamp
- File size
- Column list with types
Source-Level Stats
Dataset Profiling
Generate a statistical profile of your data:- Column-level statistics (min, max, mean, median, null count)
- Value distributions for categorical columns
- Data quality indicators
Output Formats
Query results support the same output formats as API calls:Commands Reference
| Command | Description |
|---|---|
anysite dataset query <yaml> --sql "..." | Run SQL query on collected data |
anysite dataset query <yaml> --source <id> | Query a specific source |
anysite dataset query <yaml> --interactive | Open interactive SQL shell |
anysite dataset stats <yaml> | Show dataset statistics |
anysite dataset stats <yaml> --source <id> | Show stats for a specific source |
anysite dataset profile <yaml> | Generate data profile with distributions |
Next Steps
Examples
See complete end-to-end workflow examples
Database Loading
Load query results into a persistent database