Skip to main content

Server-Side Data Analysis

query_cache is one of the most powerful features of the MCP Server — it allows you to analyze data without consuming context window tokens.

The Problem

When you search for 50 LinkedIn profiles, all 50 results are returned into the LLM’s context. Filtering, sorting, or aggregating requires the LLM to process all data in-context — expensive and limited by context size.

The Solution: ClickHouse-Backed Cache

Every execute() call stores results in a ClickHouse cache (TTL: 7 days). The query_cache tool lets you query this cache server-side — only the filtered/aggregated results are returned to context.
execute("linkedin", "search", "search_people", {keywords: "CTO", count: 100})
  → Returns first 10 items + cache_key
  → All 100 results stored in ClickHouse cache

query_cache(cache_key, conditions=[{field: "followers", op: ">", value: 5000}])
  → Returns only matching profiles (server-side filtering)
  → Context receives only the filtered subset, not all 100

Filtering

Use conditions to filter cached results by any field:
query_cache(cache_key, conditions=[
  {"field": "location", "op": "contains", "value": "San Francisco"},
  {"field": "followers", "op": ">", "value": 500}
])
Multiple conditions are combined with AND logic.

Supported Filter Operators

OperatorDescriptionExample
=Exact match{"field": "country", "op": "=", "value": "US"}
!=Not equal{"field": "status", "op": "!=", "value": "inactive"}
>Greater than{"field": "followers", "op": ">", "value": 1000}
<Less than{"field": "age", "op": "<", "value": 30}
>=Greater or equal{"field": "score", "op": ">=", "value": 4.5}
<=Less or equal{"field": "price", "op": "<=", "value": 100}
containsSubstring match{"field": "title", "op": "contains", "value": "Engineer"}
not_containsSubstring exclusion{"field": "bio", "op": "not_contains", "value": "retired"}

Aggregation

Calculate summary statistics without loading individual records:
query_cache(cache_key, aggregate={"field": "followers", "op": "avg"})

Supported Aggregation Functions

FunctionDescriptionExample
countCount records{"field": "id", "op": "count"}
sumSum values{"field": "likes", "op": "sum"}
avgAverage value{"field": "followers", "op": "avg"}
minMinimum value{"field": "price", "op": "min"}
maxMaximum value{"field": "score", "op": "max"}
uniqCount unique values{"field": "country", "op": "uniq"}

Group By

Combine aggregation with grouping to get breakdowns:
query_cache(cache_key, aggregate={"field": "followers", "op": "count"}, group_by="industry")
Result:
[
  {"industry": "Technology", "count": 45},
  {"industry": "Finance", "count": 23},
  {"industry": "Healthcare", "count": 12}
]

Sorting

Sort cached results by any field:
query_cache(cache_key, sort_by="followers", sort_order="desc", limit=10)
Returns only the top 10 results — the rest stay in cache, not in context.

Combined Example

A full workflow combining all features:
1. execute("linkedin", "search", "search_people", {keywords: "VP Engineering", count: 200})
   → 200 profiles cached, first 10 returned + cache_key

2. query_cache(cache_key, conditions=[
     {field: "location", op: "contains", value: "Bay Area"},
     {field: "followers", op: ">", value: 500}
   ])
   → Filter to Bay Area VPs with 500+ followers

3. query_cache(cache_key,
     aggregate={field: "followers", op: "avg"},
     group_by="company")
   → Average follower count by company

4. query_cache(cache_key,
     sort_by="followers", sort_order="desc", limit=5)
   → Top 5 most-followed VPs

5. export_data(cache_key, "csv")
   → Download all 200 profiles as CSV file
Key benefit: Steps 2-4 don’t make new API calls and don’t load all 200 profiles into context. Only the filtered/aggregated results are returned to the LLM.

Export Data

Download full cached datasets as files:
export_data(cache_key, "csv")

Supported Formats

FormatDescription
jsonJSON array
csvComma-separated values
jsonlJSON Lines (one record per line)
Returns a download URL. Data stays cached for 7 days.

Best Practices

Use Large Counts

Fetch more data with execute() (count: 100-200), then filter with query_cache. Cheaper than multiple small requests.

Filter Before Reading

Always use query_cache to narrow results before loading them into context. Saves tokens and improves response quality.

Aggregate Server-Side

Use aggregation functions instead of asking the LLM to calculate averages, counts, or sums from raw data.

Export for External Use

Use export_data when you need the full dataset outside of the AI conversation (spreadsheets, databases, reports).