Data Analysis (query_cache) - Anysite Documentation

Server-Side Data Analysis

query_cache is one of the most powerful features of the MCP Server — it allows you to analyze data without consuming context window tokens.

The Problem

When you search for 50 LinkedIn profiles, all 50 results are returned into the LLM’s context. Filtering, sorting, or aggregating requires the LLM to process all data in-context — expensive and limited by context size.

The Solution: ClickHouse-Backed Cache

Every execute() call stores results in a ClickHouse cache (TTL: 7 days). The query_cache tool lets you query this cache server-side — only the filtered/aggregated results are returned to context.

execute("linkedin", "search", "search_users", {keywords: "CTO", count: 100})
  → Returns first 10 items + cache_key
  → All 100 results stored in ClickHouse cache

query_cache(cache_key, conditions=[{field: "followers", op: ">", value: 5000}])
  → Returns only matching profiles (server-side filtering)
  → Context receives only the filtered subset, not all 100

Filtering

Use conditions to filter cached results by any field:

query_cache(cache_key, conditions=[
  {"field": "location", "op": "contains", "value": "San Francisco"},
  {"field": "followers", "op": ">", "value": 500}
])

Multiple conditions are combined with AND logic.

Supported Filter Operators

Operator	Description	Example
`=`	Exact match	`{"field": "country", "op": "=", "value": "US"}`
`!=`	Not equal	`{"field": "status", "op": "!=", "value": "inactive"}`
`>`	Greater than	`{"field": "followers", "op": ">", "value": 1000}`
`<`	Less than	`{"field": "age", "op": "<", "value": 30}`
`>=`	Greater or equal	`{"field": "score", "op": ">=", "value": 4.5}`
`<=`	Less or equal	`{"field": "price", "op": "<=", "value": 100}`
`contains`	Substring match	`{"field": "title", "op": "contains", "value": "Engineer"}`
`not_contains`	Substring exclusion	`{"field": "bio", "op": "not_contains", "value": "retired"}`

Aggregation

Calculate summary statistics without loading individual records:

query_cache(cache_key, aggregate={"field": "followers", "op": "avg"})

Supported Aggregation Functions

Function	Description	Example
`count`	Count records	`{"op": "count"}` (field optional)
`sum`	Sum values	`{"field": "likes", "op": "sum"}`
`avg`	Average value	`{"field": "followers", "op": "avg"}`
`min`	Minimum value	`{"field": "price", "op": "min"}`
`max`	Maximum value	`{"field": "score", "op": "max"}`
`uniq`	Count unique values	`{"field": "country", "op": "uniq"}`

Group By

Combine aggregation with grouping to get breakdowns:

query_cache(cache_key, aggregate={"field": "followers", "op": "count"}, group_by="industry")

Result:

{
  "aggregate": {
    "groups": {
      "Technology": 45,
      "Finance": 23,
      "Healthcare": 12
    },
    "op": "count"
  },
  "cache_key": "abc123..."
}

Sorting

Sort cached results by any field:

query_cache(cache_key, sort_by="followers", sort_order="desc", limit=10)

Returns only the top 10 results — the rest stay in cache, not in context.

Combined Example

A full workflow combining all features:

1. execute("linkedin", "search", "search_users", {keywords: "VP Engineering", count: 200})
   → 200 profiles cached, first 10 returned + cache_key

2. query_cache(cache_key, conditions=[
     {field: "location", op: "contains", value: "Bay Area"},
     {field: "followers", op: ">", value: 500}
   ])
   → Filter to Bay Area VPs with 500+ followers

3. query_cache(cache_key,
     aggregate={field: "followers", op: "avg"},
     group_by="company")
   → Average follower count by company (across all 200 profiles, not just the Bay Area subset)

4. query_cache(cache_key,
     sort_by="followers", sort_order="desc", limit=5)
   → Top 5 most-followed VPs (from all 200 profiles)

5. export_data(cache_key, "csv")
   → Download all 200 profiles as CSV file

Key benefit: Step 1 makes a single API call to collect data. Steps 2-5 are cache operations at zero additional API cost — they don’t load all 200 profiles into context. Only the filtered/aggregated results are returned to the LLM.

Export Data

Download full cached datasets as files:

export_data(cache_key, "csv")

export_data always exports the full dataset stored under the cache_key, regardless of any filters applied via query_cache. Filters only affect what is returned to context — they do not modify the cached data.

Supported Formats

Format	Description
`json`	JSON array
`csv`	Comma-separated values
`jsonl`	JSON Lines (one record per line)

Returns a download URL. Data stays cached for 7 days.

Best Practices

Use Large Counts

Fetch more data with execute() (count: 100-200), then filter with query_cache. Cheaper than multiple small requests.

Filter Before Reading

Always use query_cache to narrow results before loading them into context. Saves tokens and improves response quality.

Aggregate Server-Side

Use aggregation functions instead of asking the LLM to calculate averages, counts, or sums from raw data.

Export for External Use

Use export_data when you need the full dataset outside of the AI conversation (spreadsheets, databases, reports).

​Server-Side Data Analysis

​The Problem

​The Solution: ClickHouse-Backed Cache

​Filtering

​Supported Filter Operators

​Aggregation

​Supported Aggregation Functions

​Group By

​Sorting

​Combined Example

​Export Data

​Supported Formats

​Best Practices

Use Large Counts

Filter Before Reading

Aggregate Server-Side

Export for External Use

Server-Side Data Analysis

The Problem

The Solution: ClickHouse-Backed Cache

Filtering

Supported Filter Operators

Aggregation

Supported Aggregation Functions

Group By

Sorting

Combined Example

Export Data

Supported Formats

Best Practices