LLM Analysis

Overview

Anysite CLI integrates with LLM providers to add AI-powered analysis to your data workflows. Six operations are available: classify, summarize, enrich, generate, match, and deduplicate.

Requires the llm extra: pip install "anysite-cli[llm]"

Setup

Configure your LLM provider:

anysite llm setup

This guides you through selecting a provider and entering your API key.

Supported Providers

Provider	Default Model	Configuration
OpenAI	gpt-4.1-mini	Uses JSON Schema for structured output
Anthropic	claude-sonnet-4-5-20250514	Uses system prompts with JSON schema

Provider settings are stored in ~/.anysite/config.yaml.

Operations

Classify

Categorize records into predefined categories:

anysite llm classify dataset.yaml --source profiles \
  --categories "developer,recruiter,executive,other" \
  --fields "name,headline,summary"

If --categories is omitted, the LLM auto-detects 3-7 appropriate categories based on the data.

Summarize

Generate concise summaries:

anysite llm summarize dataset.yaml --source profiles \
  --fields "name,headline,summary,experience" \
  --max-length 50 \
  --output-column bio_summary

Enrich

Extract new structured attributes from text data:

anysite llm enrich dataset.yaml --source profiles \
  --add "seniority:junior/mid/senior/lead" \
  --add "is_technical:boolean" \
  --add "years_experience:number" \
  --add "primary_skill:string"

Supported attribute types:

Enum — predefined choices: "seniority:junior/mid/senior"
Boolean — true/false: "is_technical:boolean"
Number — numeric value: "years_experience:number"
String — free text: "primary_skill:string"

Generate

Create new text using templates with field placeholders:

anysite llm generate dataset.yaml --source profiles \
  --prompt "Write a 2-sentence professional intro for {name} who works as {headline}" \
  --temperature 0.7 \
  --output-column intro_text

Match

Compare records across two sources and find best matches:

anysite llm match dataset.yaml \
  --source-a profiles \
  --source-b companies \
  --top-k 3

Returns the top K matches for each record in source A, with relevance scores.

Deduplicate

Find and flag semantic duplicates within a source:

anysite llm deduplicate dataset.yaml --source profiles \
  --key name \
  --threshold 0.8

Records with similarity above the threshold are flagged as potential duplicates.

Using LLM in Dataset Pipelines

Add LLM processing directly in your pipeline YAML:

sources:
  - id: profiles
    endpoint: /api/linkedin/user
    from_file: users.txt
    input_key: user

  - id: profiles_enriched
    type: llm
    dependency:
      from_source: profiles
      field: name
    llm:
      - type: classify
        categories: "developer,recruiter,executive,sales,other"
        output_column: role_type

      - type: enrich
        add:
          - "seniority:junior/mid/senior/lead"
          - "is_technical:boolean"

      - type: summarize
        max_length: 50
        output_column: bio_summary

Multiple LLM steps can be chained within a single LLM source. They execute in order, each adding new columns to the dataset.

Caching

LLM results are cached in a local SQLite database (~/.anysite/llm_cache.db) to avoid repeated API calls and reduce costs.

# View cache statistics
anysite llm cache-stats

# Clear the cache
anysite llm cache-clear

# Bypass cache for a single run
anysite llm classify dataset.yaml --source profiles \
  --categories "dev,recruiter,exec" --no-cache

Caching is especially useful when iterating on pipeline configurations — you only pay for LLM calls once per unique input.

Options Reference

Option	Description	Applies To
`--fields`	Fields to include in LLM context (comma-separated)	classify, summarize
`--categories`	Comma-separated categories	classify
`--add`	Attribute to extract (repeatable)	enrich
`--prompt`	Template with `{field}` placeholders	generate
`--temperature`	LLM creativity (0.0-1.0)	generate
`--max-length`	Max words for output	summarize
`--output-column`	Name for the result column	all
`--top-k`	Number of matches per record	match
`--key`	Field to compare for duplicates	deduplicate
`--threshold`	Similarity threshold (0.0-1.0)	deduplicate
`--no-cache`	Skip the LLM cache	all

Get Started

MCP Server

n8n Nodes

Anysite CLI

Claude Skills

Legal

Overview

Setup

Supported Providers

Operations

Classify

Summarize

Enrich

Generate

Match

Deduplicate

Using LLM in Dataset Pipelines

Caching

Options Reference

Next Steps

SQL Querying

Examples

Get Started

MCP Server

n8n Nodes

Anysite CLI

Claude Skills

Legal

​Overview

​Setup

​Supported Providers

​Operations

​Classify

​Summarize

​Enrich

​Generate

​Match

​Deduplicate

​Using LLM in Dataset Pipelines

​Caching

​Options Reference

​Next Steps

SQL Querying

Examples

Overview

Setup

Supported Providers

Operations

Classify

Summarize

Enrich

Generate

Match

Deduplicate

Using LLM in Dataset Pipelines

Caching

Options Reference

Next Steps