@@
@@@%###***%@@@@@
@@@@@@%******%@@@@@@@
@@@@@@@@@#*****#@@@@@@@@
@@@@@@@@@@@@*****#@@@@@@@@
@@@@@@@@@@@@%*****@@@@@@@
@@@@@@@@@@@@@@******@@@@@@
@@@@@@@@@@@@@@*****@@@@@@
@@@@@@@@@@@@@%**+*%%@%@
@@@@@@@@%@@%++++*@@@@@
@@@@%@@@@@@*+*#%##**#%###%@
@@@@@@@@@##%@@@@@@@@@@@@@@%@
@@@@@%#%@@@@@@@@@@@@@@@@@@@@%@
@: =%@@@@@@@@@@@@@@%#%@@@@@@@%
@@@@@@%::. -%*-. .== %@@@@@@
%%%%%@@:-=. =. .-=*: %@@@@@@
@@@@@%::.- .*: :: .:*@@@@
@++..- -- :: %@@
@:: .- + -- .. -: +@
-:. .= := .. = .@
@@@+ :. :% =. +.%@%. := .*@@ @@
@@*: ..:. =@= + :*.+@@@: = ... .*@
*+. ::: .@@%==+*. *@@%..: . . +@
@- .%****-:=======*:.*+.:-:+*+==** =#
%%:= ::=*+===**+**+==:-=*+==*#**** :-@
@. ::::#=+*-:::::**======***: :*#
@@%%%@#+-:**#*:::::::*=====#* *. .%@
@*==========##*-:::::%*===**@ @:*- ::-%@
@@*====*%%+============*##@@ @%#@@
@@@@%*=*#*+==*#**:-@%%@@
%:.:* .-@@ @@@%%=:: =%%%%%%@@@
@*= =- .: =@ @%%@- .*%%%%%%%%%%@@
@% += @ =#%=.%@ @@%%%%=%%%%%%%%%%%%%@@
@#.%%+ :* :@%##* *@ @@@%%@%%%%%%%%%%%%%@%#####%%@%@@@@@@@@
%.=##% .# :* .%##* +@@@@@@@@@@@@@%%#%%#%%%%%@%%%%%%@%##########%##%%####%@@@
%:*##@%== :%%%. *%%%%%%%@%#########%@%%@%#@%%%%%#######################%@ @@
@##@ + *: .@%%%%%%@%#########%%@%#%%@%%%#########################%@ @#
@##@ @%.:= *@%%%%%%@%#########%##*-##############################%@@@@ @:#
@%#%@ @*-%*:- =@%%%%%%@%#########%%#%*##############################@%####@@@ @= .%
@@#%@ @%*-:+#%%@%%%%%%%@%###%@@@@@@@###############%%%###%%#########@%######%- :*-%: .%
@%#@@ @@@@@@%%%%%%%%@@@@ @########################%%%%%%%###########= :*-* :@
@#%@ @@%%%%%%%%%@@@ @%#######################%%################%. . . .@
@%#% @@@@@@@@@ %#######################%%#%###############@. :@@@
@@#%@ @######################%%%##################= -%*
@%#@@ @%#####################%%%#%%###############* :@
@##@ @####################%%%###################%. =%@
@%#%@ @@##################@%@####################%- -**
@##@ @@###############%%%%#####################%- .%@
@%#%@ @%###########@%%%#######################%: .%
@##@ @@@%%%%@%%%@%#########################% .%
%#%@ @@#%##- =%#######################%+ :.%
@##% @###@@*. -%####################%@@%.*@
@@#%@ @%#%@@@@@#=-%#################%%=%*@
@%#%@ @%#%@@ @*==+#@##############@*=%@
@@##@ @ @#*====%%@%##########%%===@
@%#%@ #===###*=#@@%######@@#====@
@##@@ @@@@@@@ @%=**====*@ @@%%%@@ @*====#@@@
@%#%@ @%#*++=======+==========+@ %*+====*+=*@
@%#@@ @@=======##================*@@ *===++=====@
%##@ @========+%*===============%@ @#*===*#*==#@
@##%@ @%*+======+%*============#+# @%%=======+==+*%@@
@@@ @@%=====******===++=#%%##***======+**====+***==%@
@%**+++**#*+**%@@*++*+==================+-=*=%
@%========*##**=============+==%
@%*+===========#%#+==========*@
@*=============+##%%***=#
@#+====++==========#%@
@@%*+*@@
Local MCP server that indexes codebases and provides code-aware search, reducing token spend for AI agents by 80-90%. Scrooge parses source code into semantic chunks, compresses them into sketches, and serves them through hybrid retrieval (lexical + vector search with RRF fusion) — so agents get the context they need without paying for the tokens they don't.
scrooge_search— Hybrid code search combining FTS5 lexical and sqlite-vec vector search with Reciprocal Rank Fusionscrooge_map— Repository map with directory tree and hierarchical summaries at repo, module, or file levelscrooge_lookup— Symbol lookup: find definitions and all usages across the codebasescrooge_source— Exact chunk/symbol source retrieval without opening the whole filescrooge_context— Project patterns for a chunk kind: common annotations, tags, imports, and example sketchesscrooge_deps— Compact dependency graph: forward (what a symbol uses) and reverse (who uses it)scrooge_reindex— Trigger full or incremental indexing of a repositoryscrooge_status— Check index freshness: last indexed commit, total chunks, stalenessscrooge_statistics— Usage metrics and token savings breakdown over configurable time periods- Execution hooks — Automatic context injection before Write/Edit, exploration nudges for Read/Grep/Glob, and session onboarding with index summary
- Multi-channel — Shared API layer supports Claude Code (MCP) and pi.dev (extension) with per-channel telemetry
- Node.js >= 20.0.0
- Git
- C++ compiler (required for native deps:
better-sqlite3,tree-sitter)
| OS | Install command |
|---|---|
| macOS | xcode-select --install |
| Ubuntu/Debian | sudo apt install build-essential |
| Windows | Visual Studio Build Tools or npm install -g windows-build-tools |
git clone https://github.com/fabricio-costa/scrooge.git
cd scrooge
npm install
npm run setupnpm run setup builds the project, registers the MCP server with Claude Code (user scope), configures hooks (SessionStart onboarding, PreToolUse pattern injection + exploration nudges, PostToolUse observability), manages pi.dev AGENTS.md, and optionally installs the pi.dev extension.
To add Scrooge instructions to any project's CLAUDE.md manually:
cat ~/.scrooge/agent-instructions.mdTo uninstall: npm run uninstall
Manual registration (advanced)
Register at user scope so Scrooge is available from any project directory:
# Build first:
npm run build
# Production (uses compiled JS):
claude mcp add -s user scrooge -- node /absolute/path/to/scrooge/bin/scrooge-mcp.mjs
# Development (uses tsx for live reload):
claude mcp add scrooge -- npx tsx /absolute/path/to/scrooge/src/index.tsThe launcher script (bin/scrooge-mcp.mjs) automatically detects when native modules (better-sqlite3, tree-sitter) were compiled against a different Node.js version and rebuilds them before starting the server.
pi install /path/to/scrooge/packages/pi-extensionPi.dev loads TypeScript extensions via jiti — no build step needed for the extension itself. Hot-reload with /reload.
Once registered, Scrooge tools are available in your agent sessions. No manual indexing required — the index is created and maintained automatically.
1. Search the codebase
> Use scrooge_search to find authentication-related code
On the first query, Scrooge automatically indexes the repository. On subsequent queries, if the repo has new commits, an incremental reindex runs transparently before returning results. You never need to think about scrooge_reindex — the index stays fresh automatically.
Returns ranked results with sketch-compressed snippets, staying within a token budget.
2. Explore the repo map
> Use scrooge_map at repo level to see the project structure
Returns a directory tree with hierarchical summaries of each module.
3. Look up a symbol
> Use scrooge_lookup to find where LoginViewModel is defined and used
4. Check your savings
> Use scrooge_statistics to see token savings
Shows how much Scrooge saved by comparing compressed responses to raw content costs.
Hybrid code search combining query rewriting, FTS5 lexical search, sqlite-vec vector search, and a light heuristic reranker on top of RRF fusion.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query |
string | yes | — | Natural language or code identifier |
repo_path |
string | no | cwd | Absolute path to the repository |
filters.module |
string | no | — | Gradle module (e.g. ":app") |
filters.language |
string | no | — | Language: kotlin, typescript, dart, python, xml, gradle |
filters.kind |
string | no | — | Chunk kind: class, function, composable, etc. |
filters.tags |
string[] | no | — | Tags: ["hilt", "compose"] |
view |
string | no | "sketch" |
"sketch" (compressed), "implementation" (focused code context), or "raw" (full source) |
max_results |
number | no | depends on view | Maximum number of results |
token_budget |
number | no | depends on view | Max tokens in response |
Example response:
{
"results": [
{
"file": "src/auth/LoginViewModel.kt",
"kind": "class",
"name": "LoginViewModel",
"sketch": "class LoginViewModel : ViewModel() { fun login(email, password) ... }",
"score": 0.85
}
],
"totalTokens": 1076,
"truncated": false
}Repository map providing directory tree and hierarchical summaries.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
repo_path |
string | no | cwd | Path to the repository |
level |
string | no | "repo" |
Detail: "repo", "modules", or "files" |
module |
string | no | — | Focus on a specific module |
Find a symbol's definition and all usages across the codebase.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
symbol |
string | yes | — | Symbol name (e.g. "LoginViewModel") |
repo_path |
string | no | cwd | Path to the repository |
include_usages |
boolean | no | true |
Include usage locations |
Fetch the exact raw source for a known chunk or symbol, without opening the whole file.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
chunk_id |
string | no* | — | Exact chunk ID returned by scrooge_search or scrooge_lookup |
symbol |
string | no* | — | Symbol name to fetch raw source for |
before |
number | no | 0 |
Extra lines of file context before the chunk |
after |
number | no | 0 |
Extra lines of file context after the chunk |
repo_path |
string | no | cwd | Path to the repository |
* Provide at least one of chunk_id or symbol.
Example response:
{
"chunkId": "app/src/main/LoginViewModel.kt:1-30:abc123",
"before": 0,
"after": 0,
"chunks": [
{
"id": "app/src/main/LoginViewModel.kt:1-30:abc123",
"path": "app/src/main/LoginViewModel.kt",
"lines": "1-30",
"kind": "viewmodel",
"symbol": "LoginViewModel",
"signature": "class LoginViewModel @Inject constructor() : ViewModel()",
"source": "class LoginViewModel @Inject constructor() : ViewModel() { ... }"
}
]
}Trigger indexing of a repository.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
repo_path |
string | no | cwd | Path to the repository |
incremental |
boolean | no | true |
Only index files changed since last index |
Get information about the current index state.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
repo_path |
string | no | cwd | Path to the repository |
Get project patterns for a given chunk kind — so the agent writes code matching existing conventions.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
kind |
string | yes | — | Chunk kind (e.g. "viewmodel", "composable", "dao") |
module |
string | no | — | Filter to a specific module |
repo_path |
string | no | cwd | Path to the repository |
Example response:
{
"kind": "viewmodel",
"sampleCount": 5,
"commonAnnotations": ["@HiltViewModel", "@Inject"],
"commonTags": ["hilt", "viewmodel", "coroutine"],
"commonImports": ["StateFlow", "MutableStateFlow", "viewModelScope"],
"exampleSketches": [
{ "path": "feature/auth/LoginViewModel.kt", "sketch": "class LoginViewModel @Inject constructor(...)" }
]
}Compact dependency graph for refactoring decisions: who a symbol depends on and who depends on it.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
symbol |
string | yes | — | Symbol name (e.g. "AuthRepository") |
direction |
string | no | "both" |
"forward", "reverse", or "both" |
repo_path |
string | no | cwd | Path to the repository |
Example response:
{
"symbol": "AuthRepository",
"definitions": [{ "symbol": "AuthRepository", "path": "data/AuthRepository.kt", "kind": "class", "module": ":data" }],
"forward": [{ "symbol": "ApiService", "path": "api/ApiService.kt", "kind": "api_interface", "module": ":api" }],
"reverse": [{ "symbol": "LoginViewModel", "path": "feature/auth/LoginViewModel.kt", "kind": "viewmodel", "module": ":feature:auth" }]
}Usage and token savings metrics.
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
repo_path |
string | no | cwd | Path to the repository |
period |
string | no | "all" |
"today", "week", "month", or "all" |
format |
string | no | "text" |
"text" for the human report or "json" for structured dashboard-friendly output |
Example output:
## Scrooge Statistics — kotlin-pdv
Period: all time (since Feb 20, 2026)
### Token Savings
Tokens delivered: 45,200
Raw equivalent: 120,000
Saved: 74,800 (62.3%)
### Savings by Tool
search: 1,200 delivered / 8,500 raw (85.9% saved)
lookup: 600 delivered / 3,500 raw (82.9% saved)
map: 200 delivered / 2,000 raw (90.0% saved)
### Usage (70 total calls)
search: 42 | map: 15 | lookup: 8 | reindex: 3 | status: 2
### Models
claude-opus-4-6: 30 calls (25,000 tokens)
claude-sonnet-4-5: 40 calls (20,200 tokens)
### Search Insights
Avg results/query: 5.2 | Avg tokens/query: 1,076
Sources: lexical 30% | vector 25% | both 45%
For dashboards or automation, request format: "json":
{
"repo": { "path": "/Users/alice/projects/kotlin-pdv", "name": "kotlin-pdv" },
"period": { "key": "all", "label": "all time (since 2026-02-20)", "since": null, "firstCallAt": "2026-02-20 09:15:00" },
"totals": { "totalCalls": 70, "tokensDelivered": 45200, "rawEquivalent": 120000, "tokensSaved": 74800, "savingsPct": 62.3 },
"usageByTool": [{ "tool": "search", "callCount": 42, "tokensSent": 1200, "tokensRaw": 8500, "tokensSaved": 7300, "savingsPct": 85.9 }],
"coverage": { "coveragePct": 81.4, "grepBypasses": [{ "selector": "AuthRepository", "count": 3 }], "bypassReasons": [{ "reasonCode": "known_path_regex", "count": 2 }] }
}Environment variables:
| Variable | Description |
|---|---|
SCROOGE_MODEL |
AI model identifier (e.g., claude-opus-4-6). Recorded in telemetry for per-model usage breakdown in scrooge_statistics. |
SCROOGE_NATIVE_EXPLORATION_POLICY |
Guardrail mode for native Read/Grep/Glob on indexed repos: off, warn (default), or strict. strict blocks blind code exploration while still allowing non-code reads, regex on a known path, and guided follow-up reads. |
SCROOGE_NATIVE_EXPLORATION_OVERRIDE_REASON |
Optional operator override reason code for intentional native bypasses: known_raw_content, known_path_regex, non_code_file, or final_verification. Recorded in observed.jsonl diagnostics when applicable. |
Scrooge registers several hooks to integrate seamlessly with agent workflows. npm run setup configures all hooks automatically.
| Hook | Trigger | Purpose |
|---|---|---|
| SessionStart | Session begins | Injects index summary + tool preference directives for indexed repos |
| PreToolUse (Write|Edit) | Before file writes | Injects project patterns (annotations, imports, sketches) |
| PreToolUse (Read|Grep|Glob) | Before exploration | Applies native-exploration guardrails: warn nudges toward Scrooge (rate-limited: 3/session), strict blocks blind code exploration |
| PostToolUse | After any tool call | Records tool usage to ~/.scrooge/observed.jsonl for coverage metrics |
All hooks return {} for non-indexed repos (zero overhead) and fail silently on timeout.
For native exploration guardrails, set SCROOGE_NATIVE_EXPLORATION_POLICY=off|warn|strict. The default is warn. In strict, blind Read/Grep/Glob calls are blocked on indexed repos, while non-code reads, regex on a known path, and guided follow-up reads remain allowed.
Manual hook configuration (Claude Code)
Add to ~/.claude/settings.json (user scope) or your project's .claude/settings.json:
{
"hooks": {
"SessionStart": [{
"hooks": [{ "type": "command", "command": "node /path/to/scrooge/bin/scrooge-session.mjs", "timeout": 3 }]
}],
"PreToolUse": [
{
"matcher": "Write|Edit",
"hooks": [{ "type": "command", "command": "node /path/to/scrooge/bin/scrooge-hook.mjs", "timeout": 3 }]
},
{
"matcher": "Read|Grep|Glob",
"hooks": [{ "type": "command", "command": "node /path/to/scrooge/bin/scrooge-nudge.mjs", "timeout": 2 }]
}
],
"PostToolUse": [{
"matcher": "",
"hooks": [{ "type": "command", "command": "node /path/to/scrooge/bin/scrooge-observe.mjs", "timeout": 3 }]
}]
}
}The pi.dev extension handles all hooks automatically via tool_call and tool_result events — no additional configuration needed. During installation, npm run setup also appends Scrooge instructions to ~/.pi/agent/AGENTS.md (with HTML markers for safe updates/removal).
bin/
├── scrooge-mcp.mjs # MCP launcher (auto-rebuilds native modules if needed)
├── scrooge-session.mjs # SessionStart hook — injects index summary + directives
├── scrooge-hook.mjs # PreToolUse hook — injects project patterns for Write/Edit
├── scrooge-nudge.mjs # PreToolUse hook — warn/strict guardrails for Read/Grep/Glob
├── scrooge-observe.mjs # PostToolUse hook — records tool calls for coverage metrics
├── setup.mjs # One-command setup: build, register, configure hooks
└── uninstall.mjs # Clean removal of all registrations and hooks
templates/
└── agent-instructions.md # Reusable Scrooge tool preference template
packages/
└── pi-extension/ # pi.dev extension: tools + hooks via tool_call/tool_result events
src/
├── index.ts # Entry point — starts MCP server
├── api/ # Transport-agnostic API layer (shared by MCP + pi.dev)
│ ├── index.ts # Barrel export
│ ├── types.ts # Shared request/response interfaces, Channel type
│ ├── search.ts # search() — orchestrates hybrid search + telemetry
│ ├── lookup.ts # lookup() — symbol definitions + usages + telemetry
│ ├── source.ts # source() — exact chunk/symbol source + telemetry
│ ├── map.ts # map() — repo tree + summaries + telemetry
│ ├── context.ts # context() — project pattern aggregation + telemetry
│ ├── deps.ts # deps() — dependency graph extraction + telemetry
│ ├── reindex.ts # reindex() — pipeline trigger + telemetry
│ ├── status.ts # status() — index freshness check + telemetry
│ └── statistics.ts # statistics() + buildStatisticsReport()
├── server/
│ ├── mcp.ts # MCP server creation and tool registration
│ └── tools/ # Thin MCP adapters: Zod schema → API call → JSON response
├── indexer/
│ ├── pipeline.ts # Orchestrates: classify → chunk → sketch → embed → store
│ ├── classifier.ts # File type detection (Kotlin, TypeScript, Dart, Python, XML, Gradle, generic)
│ ├── chunkers/ # Language-specific chunkers (tree-sitter for Kotlin/TypeScript/Dart/Python, regex for others)
│ └── sketcher.ts # Compresses chunks into token-efficient sketches
├── retrieval/
│ ├── hybrid.ts # Orchestrates lexical + vector search with RRF fusion + heuristic reranking
│ ├── lexical.ts # FTS5 lexical search with query variants and symbol/path heuristics
│ ├── query.ts # Query planning: stop-word cleanup, alias expansion, exact-term extraction
│ ├── vector.ts # sqlite-vec cosine similarity search
│ └── packager.ts # Token-budgeted result packaging with diversity constraints
├── repomap/
│ ├── tree.ts # Directory tree generation
│ └── summaries.ts # Hierarchical module/file summaries from indexed data
├── storage/
│ └── db.ts # SQLite schema, migrations, CRUD operations
└── utils/
├── config.ts # Configuration with defaults
├── tokens.ts # Token count estimation
├── git.ts # Git operations (diff, log, file listing)
├── freshness.ts # Auto-reindex: ensures index is fresh before queries
└── embeddings.ts # Local embeddings via @xenova/transformers
Repository files
│
▼
┌─────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ ┌─────────┐
│ Classify │────▶│ Chunk │────▶│ Sketch │────▶│ Embed │────▶│ Store │
│ (type) │ │ (parse) │ │(compress)│ │(384-dim)│ │(SQLite) │
└─────────┘ └──────────┘ └──────────┘ └─────────┘ └─────────┘
- Classify — Detect file type by extension (
.kt→ Kotlin,.ts/.tsx→ TypeScript,.dart→ Dart,.py→ Python,.xml→ XML,.gradle.kts→ Gradle, everything else → generic) - Chunk — Parse into semantic units. See Supported Languages below
- Sketch — Compress each chunk into a token-efficient summary. See Sketches below
- Embed — Compute vector embeddings for semantic search. See Embeddings below
- Store — Write chunks, sketches, and vectors to SQLite with FTS5 and sqlite-vec indexes
| Language | Parser | Chunk kinds |
|---|---|---|
| Kotlin | tree-sitter AST | class, viewmodel, composable, function, method, api_interface, dao, entity |
| TypeScript/TSX | tree-sitter AST | class, function, method, interface, type_alias, enum |
| Dart/Flutter | tree-sitter AST | class, function, method, enum, mixin, extension, type_alias |
| Python | tree-sitter AST | class, dataclass, function, method |
| XML (Android) | Regex patterns | manifest_component, nav_destination, layout, values |
| Gradle | Regex patterns | gradle_plugins, gradle_android, gradle_dependencies, gradle_settings |
| Other | Line-based splitter | generic_block, generic_file |
Tree-sitter chunkers extract semantic boundaries (class/function/interface declarations), while regex-based chunkers match structural patterns. Large classes (>400 lines) are automatically split into a class-level chunk plus individual method chunks.
Sketches are compressed representations of code chunks that preserve structure while dropping implementation details. They typically reduce token count by 80-90% compared to raw source.
What a sketch preserves:
- Signatures — function/method signatures, class declarations, type annotations
- Doc comments — JSDoc (
/** */) and KDoc comments - Annotations —
@HiltViewModel,@Composable,@GET, etc. - Class skeleton — property declarations and method signatures (no bodies)
- Interface members — property and method signatures
- Enum members — member names and values
Each sketch is capped at 200 tokens (configurable via sketchMaxTokens). Longer sketches are truncated with a ... (truncated) marker.
Scrooge uses all-MiniLM-L6-v2 via @xenova/transformers for local embedding generation — no external API calls, no network dependency. The model is vendored in models/ and remote downloads are blocked at runtime, eliminating supply-chain risk from HuggingFace Hub.
| Property | Value |
|---|---|
| Model | Xenova/all-MiniLM-L6-v2 (quantized ONNX) |
| Dimensions | 384 |
| Pooling | Mean pooling over all output tokens |
| Normalization | L2-normalized (unit vectors), enabling cosine similarity via dot product |
| Runtime | In-process via ONNX Runtime (Node.js) |
| Storage | Vendored in models/ (~23MB, committed to git) |
During indexing, Scrooge embeds each chunk's sketch (not the raw source). This is intentional — the sketch contains the semantic essence (signatures, names, structure) without implementation noise, producing higher-quality embeddings for code search.
During search, the user's query is embedded with the same model and compared against all stored vectors using sqlite-vec's cosine distance.
Scrooge automatically keeps the index fresh. Before every search, map, or lookup call, it compares the repository's current HEAD with the last indexed commit. If they differ, an incremental reindex runs transparently before returning results.
Tool called (search/map/lookup)
│
▼
HEAD == last_indexed_sha?
├─ yes → proceed normally
└─ no → incremental reindex → proceed
This means you never need to call scrooge_reindex manually — the index is always up to date when you query it. The first query on an unindexed repo triggers a full index automatically.
When auto-reindex occurs, a _note field is included in the response with timing and file count details.
Query
│
▼
┌──────────────────────┐
│ Query planner │
│ - stop-word cleanup │
│ - CamelCase splitting│
│ - alias expansion │
└──────────┬───────────┘
│
┌────────┴────────┬────────────────┐
▼ ▼ │
┌──────────┐ ┌──────────┐ │
│ FTS5 │ │sqlite-vec│ │
│ lexical │ │ vector │ │
└────┬─────┘ └────┬─────┘ │
│ │ │
▼ ▼ │
┌──────────────────────┐ │
│ RRF + reranker │ │
│ exact symbol/path │ │
│ + kind/lang hints │ │
└──────────┬───────────┘ │
▼ │
┌───────────────┐ ┌───────┴───────┐
│ Packager │◀─────│ Token Budget │
│ (diversity + │ │ (default │
│ dedup) │ │ 3000) │
└───────┬───────┘ └───────────────┘
▼
Ranked results
(sketch / implementation / raw)
- Query planning — Scrooge strips stop words, splits CamelCase, preserves exact identifier variants, and expands common code aliases such as
vm → viewmodel,repo → repository, and plural forms likerepositories → repository - Lexical retrieval — FTS5 full-text search still provides the main lexical ranking, but Scrooge now runs exact/strict/broad query variants and a symbol/path heuristic pass so known identifiers and filenames surface earlier
- Vector search — The query is embedded with MiniLM-L6-v2 and compared against all chunk vectors via sqlite-vec cosine distance. Results are ranked by similarity (1 - distance). Same filters apply post-query
- Fusion + reranking — Lexical and vector lists are merged using Reciprocal Rank Fusion:
score(doc) = Σ 1/(k + rank)wherek=60, then lightly reranked with exact symbol/path matches, term coverage, and kind/language hints - Packaging — Results are packed within a token budget (default 3000). A diversity constraint limits each file to at most 3 chunks, preventing a single large file from dominating results.
sketchis best for planning,implementationfor focused code understanding, andrawfor full source
All settings have sensible defaults. Override via getConfig() in code:
| Setting | Default | Description |
|---|---|---|
dbPath |
~/.scrooge/scrooge.db |
SQLite database location |
defaultTokenBudget |
3000 |
Max tokens per search response |
defaultMaxResults |
8 |
Max results per search |
maxChunksPerFile |
3 |
Diversity limit: max chunks from one file |
sketchMaxTokens |
200 |
Max tokens per sketch |
rrfK |
60 |
RRF fusion constant (higher = more weight to lower ranks) |
embeddingModel |
Xenova/all-MiniLM-L6-v2 |
Local embedding model |
embeddingDims |
384 |
Embedding vector dimensions |
modelPath |
<project>/models |
Path to vendored ONNX model files |
| Location | ~/.scrooge/scrooge.db |
| Engine | SQLite with WAL mode for concurrent reads |
| Extensions | FTS5 (full-text search), sqlite-vec (vector similarity) |
| Schema version | Managed via PRAGMA user_version with automatic migrations |
To force a full reindex, delete the database:
rm ~/.scrooge/scrooge.db| Command | Description |
|---|---|
npm run setup |
Build, register MCP server, configure hooks |
npm run uninstall |
Remove all registrations and hooks |
npm test |
Run all tests (vitest) |
npm run test:watch |
Watch mode |
npm run build |
Compile TypeScript to dist/ |
npm run dev |
Run from source via tsx |
npm run lint |
ESLint check |
npm run typecheck |
Type check without emitting |
Test fixtures in test/fixtures/ include Kotlin source files, TypeScript/TSX modules, Dart/Flutter files, Python modules, Android XML layouts, and Gradle build scripts — covering the primary file types Scrooge indexes.
- TypeScript strict mode, ESM modules
- All communication in English — code, comments, commits, and conversation responses
- Conventional commits:
feat:,fix:,refactor:,test:,docs: - Tests with vitest in
test/
| Dependency | Version | Purpose |
|---|---|---|
@modelcontextprotocol/sdk |
^1.12.0 | MCP protocol implementation |
better-sqlite3 |
^11.7.0 | SQLite database driver (native) |
sqlite-vec |
^0.1.6 | Vector similarity search extension |
tree-sitter |
^0.21.1 | Incremental parsing framework (native) |
tree-sitter-kotlin |
^0.3.8 | Kotlin grammar for tree-sitter |
tree-sitter-typescript |
^0.23.2 | TypeScript/TSX grammar for tree-sitter |
tree-sitter-dart |
github:UserNobody14#c1222f5 | Dart grammar for tree-sitter (ABI 14 compatible) |
tree-sitter-python |
^0.21.0 | Python grammar for tree-sitter |
@xenova/transformers |
^2.17.0 | Local ML embeddings (all-MiniLM-L6-v2) |
zod |
^3.24.0 | Runtime schema validation |
typescript |
^5.7.0 | Type system and compiler |
vitest |
^4.0.18 | Test framework |
@sinclair/typebox |
^0.34.0 | Schema validation for pi.dev extension |
eslint |
^10.0.1 | Linting |
Not yet specified.