Skip to content

feat: add Codebase Intelligence — repo map with PageRank-ranked structural summaries#543

Open
gnanam1990 wants to merge 3 commits intomainfrom
feat/codebase-intelligence-repo-map
Open

feat: add Codebase Intelligence — repo map with PageRank-ranked structural summaries#543
gnanam1990 wants to merge 3 commits intomainfrom
feat/codebase-intelligence-repo-map

Conversation

@gnanam1990
Copy link
Copy Markdown
Collaborator

@gnanam1990 gnanam1990 commented Apr 9, 2026

Summary

  • Adds a new module that builds a structural map of the repository by parsing source files with tree-sitter, building a cross-file reference graph weighted by IDF, ranking files with PageRank, and rendering a token-budgeted summary of the most important files and their signatures
  • Registers a RepoMap tool the model can call on-demand during sessions, with support for focus_files and focus_symbols to narrow the ranking
  • Adds a /repomap slash command for users to inspect, tune, and invalidate the map
  • Wires auto-injection of the map into the session system context behind a REPO_MAP feature flag (off by default)

What's included

Component Files Purpose
Core module src/context/repoMap/ (12 files) Symbol extraction, graph building, PageRank, token-budgeted rendering, disk cache
Tree-sitter queries src/context/repoMap/queries/ (3 .scm files) Tag queries for TypeScript, JavaScript, Python (from Aider, MIT licensed)
Test fixtures src/context/repoMap/__fixtures__/mini-repo/ (5 files) 5-file TypeScript fixture with known import graph for deterministic test assertions
RepoMap tool src/tools/RepoMapTool/ (4 files) buildTool wrapper registered in src/tools.ts, read-only, concurrency-safe
Slash command src/commands/repomap/ (3 files) /repomap, --tokens, --focus, --stats, --invalidate
Context injection src/context.ts getRepoMapContext() memoized, gated behind feature('REPO_MAP')
Feature flag scripts/build.ts REPO_MAP: false — off by default
Documentation docs/repo-map.md, README.md Full user-facing docs and README blurb

How it works

git ls-files → tree-sitter WASM parse → extract defs/refs → IDF-weighted directed graph → PageRank → render top files with signatures → stop at token budget

Files imported by many others rank highest. Common symbol names (get, set, map, value) are down-weighted via IDF. Results are cached to disk keyed by (path, mtime, size) — only changed files are re-parsed.

Configuration

# Slash command (always available, no flag needed)
/repomap                        # Default 2048 token budget
/repomap --tokens 4096          # Larger map
/repomap --focus src/tools/     # Boost specific paths
/repomap --stats                # Cache info
/repomap --invalidate           # Clear cache and rebuild

# Auto-injection into session context (requires flag)
# Set REPO_MAP: true in scripts/build.ts and rebuild

Supported languages

TypeScript, JavaScript, Python. Additional grammars in a follow-up.

Dependencies added

web-tree-sitter, tree-sitter-wasms, graphology, graphology-pagerank, graphology-operators, js-tiktoken (~80MB in node_modules)

Test plan

  • bun install — clean
  • bun test — 621 pass, 0 fail (32 new tests)
  • bun run build — success
  • bun run smoke — 0.1.8 (Open Claude)
  • Manual CLI verification of /repomap, --tokens, --focus, --stats, --invalidate
  • Manual verification of flag-on auto-injection and flag-off regression

Known limitations

  • Cold build ~25s on 2100-file repos (WASM parsing). Warm cache <100ms.
  • TypeScript query captures type refs but not function calls — ranking favors type-heavy hub files
  • Feature flag defaults to off — flip to true after internal validation

Copy link
Copy Markdown
Collaborator

@Vasanthdev2004 Vasanthdev2004 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a genuinely interesting addition, and I like that you kept the auto-injected path behind a flag while also adding a direct /repomap command and tool surface. The tests and green CI help here too.

I do still see one blocker on the current head though:

  • The PR documents and implies that repo-map auto-injection can be enabled with REPO_MAP=1 openclaude, but the actual gating in the open build is still compile-time via feature('REPO_MAP') from bun:bundle, and scripts/build.ts still hardcodes REPO_MAP: false. On the current head, setting the runtime env var alone will not make getRepoMapContext() start injecting anything into session context. So right now the user-facing docs and the shipped behavior disagree.

Concretely, the current surface looks like this:

  • src/context.ts only enables auto-injection when feature('REPO_MAP') is true
  • scripts/build.ts still sets REPO_MAP: false
  • docs/repo-map.md tells users to enable it with a runtime env var

I think this needs one of two fixes before approval:

  1. either wire the feature so the documented runtime enablement actually works in the open build, or
  2. narrow the docs and PR messaging so they clearly say only /repomap and the tool are available for now, and that auto-injection is not user-enableable in the current open build yet.

Once that mismatch is fixed on the current head, I’m happy to re-review.

@gnanam1990
Copy link
Copy Markdown
Collaborator Author

@Vasanthdev2004 Good catch — you're right, the docs and the actual gate disagreed. Fixed in 5919dde.

getRepoMapContext now enables auto-injection when either the compile-time feature('REPO_MAP') flag is true or the runtime REPO_MAP env var is truthy. Chose option 1 (wire the runtime enablement) since it keeps the documented UX working without requiring users to edit scripts/build.ts and rebuild.

const runtimeEnabled = isEnvTruthy(process.env.REPO_MAP)
if (!feature('REPO_MAP') && !runtimeEnabled) return null

scripts/build.ts still keeps REPO_MAP: false so the compile-time default stays off, but users running the open build can now flip it on with REPO_MAP=1 openclaude as the docs advertise. Verified locally that auto-injection fires with the env var set and does nothing without it. 653 tests pass.

gnanam1990 and others added 3 commits April 10, 2026 14:29
…tural summaries

Add a new module that builds a structural map of the repository by parsing
source files with tree-sitter, building a cross-file reference graph
weighted by IDF, ranking files with PageRank, and rendering a
token-budgeted summary of the most important files and their signatures.

Stage 1 — Core module (src/context/repoMap/):
  Symbol extraction via web-tree-sitter WASM, IDF-weighted reference graph
  via graphology, PageRank ranking, token-budgeted rendering via js-tiktoken
  cl100k_base, disk cache with mtime invalidation. Supports TypeScript,
  JavaScript, and Python. 10 tests.

Stage 2 — RepoMap tool (src/tools/RepoMapTool/):
  buildTool wrapper registered in src/tools.ts. Read-only, concurrency-safe.
  Supports focus_files, focus_symbols, and max_tokens parameters. 9 tests.

Stage 3 — Integration:
  Auto-injection into session context behind REPO_MAP feature flag (off by
  default). /repomap slash command with --tokens, --focus, --stats, and
  --invalidate flags. User-facing docs in docs/repo-map.md. 13 tests.

With the flag off, the system context is byte-identical to previous behavior.

Dependencies: web-tree-sitter, tree-sitter-wasms, graphology,
graphology-pagerank, graphology-operators, js-tiktoken

Tests: 32 new, 621 total passing, 0 failures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses review feedback from @Vasanthdev2004: the docs advertised
REPO_MAP=1 openclaude as the enablement path, but the gate in
getRepoMapContext only checked feature('REPO_MAP'), which is compile-time
and hardcoded to false in the open build. The env var was effectively
a no-op.

Now getRepoMapContext enables auto-injection when EITHER the compile-time
flag is true OR the runtime env var REPO_MAP is truthy. This makes the
documented enablement path actually work without requiring users to edit
scripts/build.ts and rebuild.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gnanam1990 gnanam1990 force-pushed the feat/codebase-intelligence-repo-map branch from 5919dde to 43886cc Compare April 10, 2026 08:59
Copy link
Copy Markdown
Collaborator

@Vasanthdev2004 Vasanthdev2004 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up here. I rechecked the current head 43886ccbf8ab1605ea8d948e5218bd7c5af386e9 against the actual GitHub PR surface, the latest commits, the earlier review thread, and the current check state.

This is a targeted re-review of the earlier blocker around the repo-map enablement path.

What I rechecked:

  • src/context.ts now enables repo-map auto-injection when either the compile-time feature('REPO_MAP') flag is on or the runtime REPO_MAP env var is truthy
  • scripts/build.ts still keeps the compile-time default off (REPO_MAP: false)
  • docs/repo-map.md now matches the shipped open-build behavior for runtime enablement
  • current checks are green on this head

That fixes the blocker I raised earlier. The documented REPO_MAP=1 openclaude path now actually matches the gate in the open build, instead of being a no-op.

Verdict: Approve-ready

I do not see a remaining blocker on the current head.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants