Skip to main content

Insert Data

From Stdin

Pipe API results directly into a database:
# Single profile → database
anysite api /api/linkedin/user user=satyanadella -q --format jsonl | \
  anysite db insert mydb --table profiles --stdin --auto-create

# Batch results → database
anysite api /api/linkedin/user --from-file users.txt --input-key user \
  --parallel 5 -q --format jsonl | \
  anysite db insert mydb --table profiles --stdin --auto-create

From File

Load a previously saved file:
anysite db insert mydb --table profiles --file results.jsonl --auto-create

Auto-Create Tables

With --auto-create, the CLI automatically infers the schema from the JSON data and creates the table if it doesn’t exist. Column types are determined from the data values.
The --auto-create flag is safe to use repeatedly — it only creates the table on the first run, then inserts into the existing table on subsequent runs.

Upsert (Insert or Update)

Update existing records or insert new ones based on a unique key:
# Upsert with conflict handling
cat updates.jsonl | \
  anysite db upsert mydb --table profiles --conflict-columns id --stdin

# Or use the insert command with --upsert flag
cat updated_profiles.jsonl | \
  anysite db insert mydb --table profiles --stdin --auto-create --upsert --key urn_value
The upsert performs an INSERT on new records and UPDATE on existing ones based on the specified key/conflict columns.

Inspect Schema

View the table schema in your database:
anysite db schema mydb --table profiles

Query Data

Run SQL queries against your database:
# Query with table output
anysite db query mydb --sql "SELECT name, headline FROM profiles LIMIT 10" --format table

# Export query results to CSV
anysite db query mydb --sql "SELECT * FROM profiles WHERE industry = 'Technology'" \
  --format csv --output tech_profiles.csv

# Count records
anysite db query mydb --sql "SELECT COUNT(*) FROM profiles"

Dataset Loading

Load data from a collected dataset pipeline directly into a database:
# Load all sources into database (creates tables per source)
anysite dataset load-db dataset.yaml -c mydb

# Drop existing tables and reload
anysite dataset load-db dataset.yaml -c mydb --drop-existing

# Load a specific snapshot date
anysite dataset load-db dataset.yaml -c mydb --snapshot 2026-01-15

# Auto-load during collection
anysite dataset collect dataset.yaml --load-db mydb
Each source in the pipeline becomes a separate table in the database, named after the source ID.

Diff-Based Sync

Compare collected data with what’s already in the database and apply incremental updates:
anysite dataset diff dataset.yaml --source profiles --key urn.value

# Diff with specific fields
anysite dataset diff dataset.yaml --source profiles --key urn.value --fields "name,headline"
This shows:
  • New records — present in dataset but not in database
  • Updated records — present in both but with different values
  • Deleted records — present in database but not in latest dataset

Auto-Schema and Foreign Keys

The CLI automatically:
  • Infers column types from JSON data (string, integer, float, boolean, timestamp)
  • Flattens nested objects using underscore notation (e.g., urn.valueurn_value)
  • Tracks provenance — links between parent and dependent source tables using foreign key references

Complete Pipeline-to-Database Example

# Step 1: Collect data
anysite dataset collect dataset.yaml

# Step 2: Load into PostgreSQL
anysite dataset load-db dataset.yaml -c pg

# Step 3: Query the results
anysite db query pg --sql "
  SELECT p.name, p.headline, e.company_name
  FROM profiles p
  JOIN employees e ON p.urn_value = e.urn_value
  LIMIT 20
" --format table

# Or do it all in one command:
anysite dataset collect dataset.yaml --load-db pg

Operations Reference

CommandDescription
anysite db insert <conn> --table <name> --stdinInsert data from stdin
anysite db insert <conn> --table <name> --file <path>Insert data from file
anysite db insert <conn> ... --auto-createAuto-create table from data
anysite db insert <conn> ... --upsert --key <col>Upsert on unique key
anysite db upsert <conn> --table <name> --conflict-columns <col>Upsert with conflict handling
anysite db schema <conn> --table <name>Inspect table schema
anysite db query <conn> --sql "..."Run SQL query
anysite db query <conn> --sql "..." --format csvExport query to CSV
anysite dataset load-db <yaml> -c <conn>Load dataset into database
anysite dataset load-db <yaml> -c <conn> --drop-existingReload with fresh tables
anysite dataset load-db <yaml> -c <conn> --snapshot <date>Load a specific snapshot
anysite dataset diff <yaml> --source <id> --key <field>Show diff between dataset and DB
anysite dataset diff <yaml> ... --fields "name,headline"Diff with specific fields

Next Steps