Overview
This is an open reference architecture — not an anonymized customer story. It shows what you can build on top of the Anysite MCP when the goal is factual accuracy with citations, not a quick web summary. The Scientific Research Agent is a Claude-Code-native research orchestrator over Anysite’s research data sources (OpenAlex, Crossref, arXiv, bioRxiv, Europe PMC, Semantic Scholar, PubMed, ClinicalTrials.gov, PubChem, Google Patents, ORCID, ROR, Unpaywall, and more). It mirrors the Claude Code pattern: a lead orchestrator routes the question, fans out specialist sub-agents per source family, an adversarial verifier confirms key facts against independent sources, and a synthesizer produces a cited report. The whole agent is plain Markdown — role, skills, and per-specialist prompts. The full source is published below so you can copy and fork it.5 meta-tools
discover / execute / get_page / query_cache / export_data — no hardcoded endpoints20+ research sources
Every claim cited
The Challenge
A flat web search returns plausible prose; scientific work needs traceable, cross-verified facts. The hard parts are:- Picking the right source. A DOI, a PMID, an NCT id, an ORCID, a ROR id, a patent number — each points to a different authoritative source. Guessing endpoints and parameters wastes calls and produces wrong data.
- Connecting entities across databases. The value is not one lookup — it is walking the graph: a work to its open-access locations, its authors, their institutions, their funders, the patents that cite it. That requires linking by identifier, not by name.
- Coverage that does not stop at the top. Keyword search finds the surface. The canonical foundation and the frontier come from snowballing the citation graph and pivoting on entities — in waves, until saturation.
- Metrics that disagree. Citation counts, sample sizes, priority dates routinely differ between databases. A real answer states which source a number came from and flags disagreements instead of averaging them away.
The Architecture
The agent reproduces the Claude Code orchestration pattern on top of one MCP surface:| Role | File | Responsibility |
|---|---|---|
| Orchestrator | orchestrator.md | Classifies the question, picks LINEAR vs FAN-OUT, runs the wave pipeline, owns the answer contract |
| Scholarly works | agents/scholarly-works.md | Publications, citation graph, metrics — OpenAlex / Crossref / arXiv / bioRxiv / Europe PMC / Semantic Scholar / CORE / OpenAIRE |
| Identity & OA | agents/identity-oa.md | Entity resolution and open-access — Unpaywall / ROR / ORCID / DOAJ / DataCite / Zenodo |
| Clinical & chemistry | agents/clinical-chem.md | Trials and compounds — ClinicalTrials.gov / PubMed / PubChem |
| Patents & IP | agents/patents.md | Patents and trademarks — Google Patents / PATENTSCOPE / EUIPO / WIPO Brands / USPTO Trademarks |
| Verifier | agents/verifier.md | Adversarially confirms a claim against an independent second source |
| Synthesizer | agents/synthesizer.md | Assembles findings into a cited report, optional dataset export |
| Method skill | skills/research-method.md | The executor loop: seed, paginate, snowball, pivot, dedup, saturate |
| Source map skill | skills/source-map.md | Source families + the ID crosswalk + the source-selection routine |
How It Works
Discover-first, no hardcoded endpoints. Before the firstexecute on an unfamiliar source/category pair, the agent calls discover to read endpoint names, parameter schemas, and response fields. New Anysite parsers are picked up automatically — nothing is guessed.
Adaptive depth. A get-by-id lookup is answered linearly in a handful of execute calls. A multi-entity, comparative, or “map the landscape” question escalates to a parallel fan-out of specialists.
Search runs in waves, to saturation. Keyword seeding (across multiple engines) → snowball along the citation graph (backward references for the foundation, forward citations for the frontier, related works for the neighboring cluster) → entity pivots (top author, topic, institution, funder) → a coverage registry deduped by canonical ID → a completeness critic → top-up rounds until no new canonical works appear. The stop condition is saturation, not a call counter.
ID graph-walking is the core value. Entities are linked by identifier — DOI ↔ OpenAlex W-id ↔ Unpaywall OA ↔ ORCID ↔ ROR ↔ Crossref funder ↔ PMID/PMC ↔ patent docId — never by title when an ID exists.
Adversarial verification. Key or contested facts (citation count, sample size, patent priority date) are confirmed against an independent second source; a disagreement is reported as a finding, not silently averaged. Retraction flags are checked on anything a conclusion leans on.
Efficiency by construction. discover is cheap and always precedes a first execute; execute costs credits, so already-fetched data is never re-fetched — slices and aggregations go through query_cache, and further pages through get_page.
10.48550/arXiv.*), OpenAlex systematically undercounts citations by 1–2 orders of magnitude — on the ML/LLM frontier, take the citation count and graph from Semantic Scholar. For established journal DOIs, OpenAlex is reliable and richer. This kind of source-specific tuning lives in skills/source-map.md.Sources Used
All sources are reached through the same five MCP meta-tools — the agent never talks to a source SDK directly.| Family | Sources |
|---|---|
| Scholarly works | OpenAlex, Crossref, arXiv, bioRxiv, Europe PMC, Semantic Scholar, Google Scholar, CORE, OpenAIRE |
| Identity & Open-Access | Unpaywall, ROR, ORCID, DOAJ, DataCite, Zenodo |
| Clinical & Chemistry | ClinicalTrials.gov, PubMed, PubChem |
| Patents & IP | Google Patents, PATENTSCOPE, EUIPO, WIPO Brands, USPTO Trademarks |
Why Anysite
- One MCP surface, many sources. A single
discover/executecontract spans publications, identity, clinical, chemistry, and patents — instead of integrating a dozen APIs with a dozen auth schemes and schemas. - The graph is reachable. Citation references and citations, related works, authorships with ORCID and ROR, funders, OA locations, patent citation families — the cross-source links that make a 360° picture possible are all exposed.
- Self-extending. Because the agent is discover-first, new Anysite parsers become available to it the moment they ship, with no code change.
Key Takeaways
- Build for citations, not summaries. Anchoring every claim to a source record by its ID is what separates a research agent from a chatbot with web access.
- The value is the graph, not the lookup. Linking entities by identifier across sources — and cross-verifying one fact against two — is the whole point.
- Discover-first keeps it durable. No hardcoded endpoints means the agent grows with the API surface instead of breaking on it.
- Plain Markdown is the whole agent. Role, skills, and specialist prompts are portable text — copy them, fork them, drop them into a standalone Agent SDK wrapper later.
Full Agent Source (copy & fork)
The complete agent below is the source of truth. Drop these files into a directory, point Claude Code atCLAUDE.md, and ask a research question. Each file has a copy button.
CLAUDE.md — auto-loaded bootstrap (role + inviolable rules)
CLAUDE.md — auto-loaded bootstrap (role + inviolable rules)
README.md
README.md
orchestrator.md — lead role, adaptive orchestration, report contract
orchestrator.md — lead role, adaptive orchestration, report contract
skills/research-method.md — the executor loop & citation discipline
skills/research-method.md — the executor loop & citation discipline
skills/source-map.md — source families & ID crosswalk
skills/source-map.md — source families & ID crosswalk
agents/scholarly-works.md
agents/scholarly-works.md
agents/identity-oa.md
agents/identity-oa.md
agents/clinical-chem.md
agents/clinical-chem.md
agents/patents.md
agents/patents.md
agents/verifier.md
agents/verifier.md
agents/synthesizer.md
agents/synthesizer.md