Sync main with release/0.2.579 (api.wiki.codes MCP, 1h MCP timeout, C parser, extension parsers, bench/snapshot perf)#315
Sync main with release/0.2.579 (api.wiki.codes MCP, 1h MCP timeout, C parser, extension parsers, bench/snapshot perf)#315
Conversation
Port the legacy HTTP endpoint (stubbed in 56ea465 / v0.2.578) to the new `std.Io.net` surface: bind 127.0.0.1 with `IpAddress.parse`/`listen`, accept in a loop, and hand each stream to a detached thread. The routes and JSON response shapes match the pre-0.16 implementation so existing clients don't need changes. Refs #307, #285 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
For `codedb mcp`, stdout is reserved for JSON-RPC messages. The root-policy failure path wrote `✗ refusing to index temporary root: …` (and the normal `✓ indexed` startup line) to stdout, which hosts reject with `invalid character 'â' looking for beginning of value` on the leading UTF-8 byte of the status glyph. Switch `out.file` to stderr once `cmd == "mcp"` is resolved, so every `out.p` call on that path goes to stderr while stdout stays clean for protocol messages. Closes #304 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Read `CODEDB_PORT` from the environment; fall back to 7719 on absence or parse failure. Unblocks running multiple instances on one host, reverse-proxy setups, and integration tests that need an ephemeral port. Refs #308 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`Explorer.findSymbol` now looks up the name in `self.symbol_index` and builds results from the cached locations. The full outline scan is kept as a fallback for safety. For the index to be authoritative, `rebuildSymbolIndexFor` no longer skips `.import` / `.comment_block` kinds — those were being missed by the O(1) path and forced callers into the slow scan. Indexing every kind makes results match the scan-based path exactly. Refs #309 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop the hardcoded 7719 fallback. If CODEDB_PORT is unset, `codedb serve` exits with a clear message explaining how to enable it (suggested 47719, since 7719 and 8080 tend to collide with other local processes). If set but not parseable as u16, exit with an error. Rationale: the HTTP server opens a network port; having it bind on a predictable default when someone runs `codedb serve` accidentally is worth avoiding. Treating the env var as the on/off switch keeps the surface area minimal and makes the enabled case explicit in shell history / process listings. Refs #308 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Running `codedb serve` is itself the opt-in — codedb has no always-on daemon, so gating the listener behind an additional CODEDB_PORT requirement was belt-and-suspenders with no threat to block. Restore the previous UX: `codedb serve` starts listening on a default port, CODEDB_PORT stays as an optional override for collisions. Default is now 6767 (picked off the beaten path — 7719 and 8080 collided with other local tooling). Refs #308 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`isPathSafe` only rejected absolute `/` paths and `..` segments split on `/`, so inputs like `..\\..\\secret.txt` passed through — on platforms where `\\` is a real separator this could reach files outside the indexed tree through `/file/read` and `/edit`. Null bytes likewise could truncate paths in downstream syscalls. Mirror `mcp.isPathSafe`: reject null bytes and backslashes up front before the `/`-split loop. Addresses Codex P1 on #310. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The HTTP handler was opening files via `std.Io.Dir.cwd()`, but `codedb <root> serve` indexes paths relative to the provided root. Launched from any other directory, valid indexed paths hit the wrong base and returned false 404s (or worse — read the wrong file). Open via `explorer.root_dir` instead. Respond 500 with a clear error if the root was never configured (shouldn't happen on the normal serve path, but guards against a bare explorer). Addresses Codex P2 on #310. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The earlier change early-returned after the `symbol_index` lookup, but that map can be incomplete after fast-snapshot restore — `outlines` is populated before `rebuildSymbolIndexFor` runs on every file, and later watcher/edit updates only touch files they saw change. Symbols present in untouched files were silently dropped from results once the index had any entry for the name. Keep the O(1) path for the common case, but always fall through into the outline scan and dedupe against a per-call `(path, line_start)` set so the scan fills gaps without duplicating index hits. Addresses Codex P1 on #310. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rolls up: - Restore `codedb serve --port` on Zig 0.16 (#307) - Route MCP status output to stderr (closes #304) - Default serve port 6767; CODEDB_PORT override (#308) - O(1) findAllSymbols with safety merge-scan (#309) - Harden server.isPathSafe against `\` and NUL - /file/read resolves against indexed root See PR #310.
) codedb_remote today only hits codedb.codegraff.com (WASM-on-Workers indexer). Its sibling project codedb-cloud / wiki.codes is a Zig-native parquet router with a superset of actions and more repos indexed, but agents using codedb have no path to it. Adds an optional `backend` field to codedb_remote. Default stays "codegraff" — every existing caller is unchanged on the wire. Backends + supported actions: codegraff (default): tree, outline, search, meta wiki: tree, outline, search, symbol, policy `symbol` (exact-identifier definition lookup across an indexed repo) and `policy` (hot-pin size class) are new capabilities from wiki that codegraff doesn't expose; `meta` stays codegraff-only. Wiki requests go through the Vercel `/api/query` proxy at https://www.wiki.codes which server-side-auths to the Hetzner router. No client secrets, no API key. Slug is derived from the repo arg by replacing '/' with '-' (matches wiki's canonical naming: rust-lang/rust → rust-lang-rust). Per-backend action allowlists reject cross-backend mismatches with a clear error: action 'meta' not supported on backend 'wiki' (wiki supports: tree, outline, search, symbol, policy) Verified live against wiki.codes: codedb_remote repo=rust-lang/rust backend=wiki action=symbol query=HashMap → 25 hits in 686ms including library/std/src/collections/hash/map.rs:247 struct_def and impl_blocks, plus clippy + rust-analyzer test files Closes #311. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Benchmark Regression ReportThreshold: 10.00%
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1f04fad226
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| var req = edit_mod.EditRequest{ | ||
| .path = path, | ||
| .agent_id = agent_id, | ||
| .op = op, | ||
| .content = content, |
There was a problem hiding this comment.
Resolve HTTP edit paths against project root
The /edit handler forwards the user-supplied relative path directly into edit_mod.applyEdit, but applyEdit reads from std.Io.Dir.cwd() (in src/edit.zig) rather than explorer.root_dir. When codedb serve is launched with an explicit root (for example codedb /tmp/proj serve), /file/read and indexing operate under /tmp/proj while /edit targets the process working directory, so edits can fail or mutate the wrong file if the same relative path exists there.
Useful? React with 👍 / 👎.
| const path = extractQueryParam(request, "path") orelse { | ||
| respondJson(&conn, "400 Bad Request", "{\"error\":\"missing ?path=\"}"); | ||
| return; | ||
| }; | ||
| if (!isPathSafe(path)) { |
There was a problem hiding this comment.
Percent-decode /file/read paths before opening files
This route reads path from the query string and immediately validates/uses it, but never runs percentDecode. As a result, URL-encoded paths (for example path=src%2Fmain.zig or filenames containing spaces) are treated as literal % sequences and typically return file not found; other routes in this file already decode query values before lookup, so this creates an avoidable read regression for standard HTTP clients.
Useful? React with 👍 / 👎.
Before this change, calling codedb_remote with action=search (or
action=symbol/outline on the wiki backend) but no 'query' argument
silently sent `q=` to the remote. codegraff.com would return an empty
result set or an unhelpful error, and users couldn't tell whether
their search was genuinely empty or the request was malformed.
Fail fast with a pointer at the missing field:
error: action 'search' requires a non-empty 'query' (the search text)
error: action 'symbol' requires a non-empty 'query' (the identifier name to look up)
error: action 'outline' requires a non-empty 'query' (the file path to outline)
tree / meta / policy are unchanged (they legitimately take no query).
Verified via the MCP stdio interface:
tools/call codedb_remote {"repo":"x","action":"search"}
→ error with guidance
tools/call codedb_remote {"repo":"x","action":"symbol","backend":"wiki"}
→ error with guidance
tools/call codedb_remote {"repo":"x","action":"tree","backend":"wiki"}
→ succeeds (unchanged)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The idle watchdog in main.zig closes stdin when `now - last_activity > idle_timeout_ms` (10 min). But last_activity is only updated once per incoming message — right after readLineBuf (mcp.zig:474). Inside a single bundle call that takes longer than idle_timeout_ms (many slow sub-ops, or ops that shell out to codedb_remote / codedb_tree on big repos), the clock stays frozen at message-arrival time. At the 10-minute mark the watchdog closes stdin mid-processing. The main thread finishes the bundle and writes the response fine (stdout is untouched), but the client — whose write-end of the stdin pipe just got EPIPE'd — reports "Transport closed" on its next tool call. Fix: stamp last_activity at bundle start AND at the end of each sub-op iteration, so active processing keeps us marked live. Every sub-op takes a known-bounded time, so the watchdog can only fire when the main thread truly has nothing in flight. No change to the idle path: a bundle that completes in under 10min doesn't touch last_activity beyond what was already there; sessions that actually go idle still get reaped. Fixes #278 for the bundle case. If "Transport closed" still surfaces on non-bundle paths, we'll need a repro that doesn't go through handleBundle. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
[codex] Add native C outline parser
Benchmark Regression ReportThreshold: 10.00%
|
codedb_remote: reject empty query on actions that consume it
mcp: refresh last_activity during long bundle processing (#278)
Benchmark Regression ReportThreshold: 10.00%
|
detect: add common extension language coverage
Benchmark Regression ReportThreshold: 10.00%
|
parse: add lightweight outlines for common extensions
Benchmark Regression ReportThreshold: 10.00%
|
test: add golden coverage for extension parsers
Benchmark Regression ReportThreshold: 10.00%
|
Fix benchmark noise status reporting
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
Speed up snapshot JSON generation
Cache snapshot responses by store sequence
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
Point codedb_remote at api.wiki.codes
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
Extend MCP idle timeout to one hour
Benchmark Regression ReportThresholds: 10.00% and 50,000 ns absolute delta
|
Speed note for the #328 timeout changeI compared the current sync head (
Detailed comparison is on #328: #328 (comment) |
Summary
Syncs
mainto the current tip ofrelease/0.2.579.This release branch now includes the wiki/api.wiki.codes remote backend work, the restored local HTTP server path, MCP/server safety fixes, remote-query validation, the long-running bundle activity fix, the semver bump to
0.2.579, the native C outline parser, extension/language detection coverage, lightweight outline parsers for the newly detected extension families, golden parser coverage for those extensions, clearer benchmark noise reporting, two snapshot performance improvements, and the directapi.wiki.codesMCP remote update, and the 1-hour MCP idle timeout with prompt dead-client cleanup.Included changes
c4cc763Restorecodedb serve --porton Zig 0.16.2fbc66cRoute MCP status output to stderr so stdio MCP responses are not contaminated by status logs.aaba92eMakecodedb serveport configurable throughCODEDB_PORT.14c3160Makecodedb serveexplicitly opt-in.8b43e89Keepcodedb serveavailable by default on port6767.c9d773cAdd O(1)findSymbollookup through the complete symbol index.74ba881Hardenserver.isPathSafeagainst null bytes and backslash traversal.6ef7185Resolve/file/readpaths against the indexed root instead of process cwd.fbb8b49MakefindAllSymbolsmerge indexed symbols with outline scan results so restored snapshots keep full coverage.3233de4Bump semver to0.2.579.1f04fadAdd thewikiremote backend alongside the existing codegraff backend.e7c9fd4Add native C outline parsing for functions, structs, enums, unions, typedefs, macros, and common declarations.56af2f6Reject emptycodedb_remotequeries for actions that require user input.3988c1fRefresh MCP bundlelast_activityduring long-running bundle work so the idle watchdog does not close stdin mid-call.ad52783Add language detection coverage for.mm,.java,.kt,.svelte,.vue,.astro,.sh,.css,.scss,.sql,.proto,.f90,.ll,.mlir, and.td.27b8d81Add lightweight outline parsing for Java, Kotlin, Svelte/Vue/Astro, shell, CSS/SCSS, SQL, protobuf, Fortran, LLVM IR, MLIR, and TableGen.3ca698bAdd per-extension golden outline checks and improve.cc/.mmparsing for C++ classes,#import, Objective-C interfaces/implementations/protocols, and ObjC method names.5b76d9cMake benchmark markdown status respect both the percentage threshold and absolute-ns threshold, withNOISEfor tiny high-percent swings.f6f12b6Speed up snapshot JSON generation by reusing sorted paths, serializing the maintained symbol index, writing tree JSON directly, and chunking JSON escaping.d99041bCachecodedb_snapshotresponses by store sequence with a 16 MB cap, so repeated snapshot calls skip JSON rebuild until edits/indexing advance the sequence.32b5e3fPointcodedb_remoteatapi.wiki.codes, map native query params, add wiki security/history actions (deps,score,cves,commits,branches,dep-history), and accept raw wiki slugs such aschromium.9a70fb4Extend MCP idle timeout from 10 minutes to 1 hour while polling dead MCP clients every second.Parser coverage
.ccparses C++ includes, classes, member-like functions, and free functions..mmparses#import, Objective-C@interface/@implementation/@protocol, Objective-C method names, C++ classes, and C-style functions.CREATE ...objects.Performance notes
The intended symbol lookup improvement in this release is
c9d773c:findSymbolnow uses the complete symbol index for O(1) lookup instead of relying on slower scan-style lookup for the common path.fbb8b49keeps that indexed path while merging in outline scan results, so restored snapshots keep coverage without losing the faster lookup path.Snapshot performance was improved in two layers:
codedb_snapshotJSON generation dropped from2,835,681 nsto1,351,605 nson GitHub CI (-52.34%).codedb_snapshotcalls now reuse a cached response by store seq, dropping from1,339,408 nsto256,474 nson GitHub CI (-80.85%) against the already-optimized base.Benchmark reporting now shows both percent delta and absolute ns delta. Rows only fail when both thresholds are exceeded; high-percent tiny absolute swings are labeled
NOISE, matching the actual CI gate.Validation
bench-regression / bench: passed before merge intorelease/0.2.579.bench-regression / bench: passed before merge intorelease/0.2.579.bench-regression / bench: passed before merge intorelease/0.2.579.bench-regression / bench: passed before merge intorelease/0.2.579.bench-regression / bench: passed before merge intorelease/0.2.579.bench-regression / bench: passed before merge intorelease/0.2.579.bench-regression / bench: passed before merge intorelease/0.2.579.bench-regression / bench: passed before merge intorelease/0.2.579.bench-regression / bench: passed before merge intorelease/0.2.579.bench-regression / bench: passed before merge intorelease/0.2.579.bench-regression / bench: passed before merge intorelease/0.2.579.zig build test,zig build, and installed binary smoke confirmed MCP exits when stdin closes in ~285ms.zig build,zig build test, and live MCP smoke calls toapi.wiki.codesforjustrach/codedbsymbol lookup,axios/axiosCVEs, and raw slugchromiumpolicy.zig build test,zig build, and local base/head benchmark comparison passed.zig build test,zig build, and local base/head benchmark comparison passed.python3 -m unittest scripts/test_compare_bench.pyandpython3 -m py_compile scripts/compare-bench.py scripts/test_compare_bench.py.zig build testandzig buildpassed.zig build testandzig buildpassed.zig build testandzig buildpassed.zig build testandzig buildpassed.Notes
This branch now points at merge commit
4e38a29, the currentrelease/0.2.579tip.Supersedes #314, which only covered the wiki backend subset.