feat: attach graph freshness provenance to MCP tool responses#604
Open
SHudici wants to merge 1 commit into
Open
feat: attach graph freshness provenance to MCP tool responses#604SHudici wants to merge 1 commit into
SHudici wants to merge 1 commit into
Conversation
Agents consuming graph tools have no way to tell whether an answer
reflects the current tree: the graph may have been built hours ago,
on a different branch, at a different commit. Every response now
carries a compact `_graph` envelope:
"_graph": {
"updated_at": "2026-07-03T18:22:41",
"age_seconds": 5121,
"built_on_branch": "main",
"built_at_sha": "b72413c9d0aa..."
}
- `graph_provenance()` (tools/_common.py) reads `last_updated`,
`git_branch`, `git_head_sha` from the graph's metadata table via a
read-only SQLite connection. The db path is escaped with
`Path.as_uri()` so URI-significant characters (#, %) in repo paths
cannot derail the SQLite URI parser. Best-effort by design: any
failure (no graph, unreadable DB, missing metadata) returns None
and never fails the tool call.
- `with_provenance()` attaches the envelope to dict responses only,
skips results that already carry `_graph`, and passes everything
else through untouched.
- All 27 graph-backed MCP tools in main.py wrap their returns. For
the five async tools the wrap runs inside the asyncio.to_thread
worker (the envelope read opens the graph DB, and the event loop
must never touch disk — tirth8205#46, tirth8205#136). Excluded: get_docs_section_tool,
list_repos_tool, cross_repo_search_tool (not backed by a single
repo graph).
13 new tests covering metadata read, URI-hostile paths (% and # in
the repo path), age clamping, unparseable timestamps, optional
fields, all no-op paths, and one end-to-end registered-tool response.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Tool responses carry no freshness signal. An agent asking
query_graph(callers_of, X)gets an answer that may describe a graph built hours ago, on a different branch, at a different commit — and has no way to tell. The build metadata already exists (last_updated,git_branch,git_head_shain the metadata table); it's just never surfaced where decisions are made.Fix
Every graph-backed tool response now carries a compact
_graphprovenance envelope:graph_provenance()(tools/_common.py) reads the three metadata keys via a read-only SQLite connection. The db path is escaped withPath.as_uri()so URI-significant characters (#,%) in repo paths cannot derail SQLite's URI parser. Best-effort by design: any failure — no graph, unreadable DB, missing metadata — returnsNoneand never fails the tool call.age_secondsis clamped to ≥ 0 and omitted if the timestamp doesn't parse; branch/sha are included only when present (full SHA, not truncated — agents comparing againstgit rev-parse HEADshouldn't need prefix semantics).with_provenance()attaches the envelope to dict responses only, skips results that already carry_graph, and passes everything else through untouched.main.pywrap their returns at the return site (not via a decorator — the Build & Embed operations hang silently on Windows 11 with uvx #46/embed_graph_tool hangs indefinitely on Windows with both sentence-transformers and Gemini provider #136 regression guards introspect tool source withinspect.getsource, and a decorator would defeat them). For the five async tools the wrap runs inside theasyncio.to_threadworker: the envelope read opens the graph DB, and the event loop must never touch disk (Build & Embed operations hang silently on Windows 11 with uvx #46, embed_graph_tool hangs indefinitely on Windows with both sentence-transformers and Gemini provider #136). Excluded:get_docs_section_tool,list_repos_tool,cross_repo_search_tool(not backed by a single repo graph).Cost: one read-only SQLite open + a 3-row SELECT per tool call (off-loop for async tools), ~4 lines of JSON per response.
Testing
13 new tests: metadata read (all fields), URI-hostile repo paths (
%and#), future-timestamp clamping, unparseable-timestamp omitsage_seconds, optional branch/sha,Nonefor missinglast_updated/ missing graph DB / invalid repo root, envelope attach, no-provenance pass-through, non-dict pass-through, existing-_graphpreservation, and one end-to-end registered-tool response. The #46/#136 async/to_thread guards still pass. Full suite passes.🤖 Generated with Claude Code