Shared REST backend with stdio proxy auto-start#15
Conversation
When searching from a path inside an indexed root, find the parent index and apply path_prefix filtering so results are scoped to the subdirectory. Canonicalize cwd for consistent path matching.
Stdio sessions auto-detect or spawn a shared backend on localhost:19100 so multiple MCP clients share one in-memory index. DataSource enum routes tool calls to Local store or Remote BackendClient. LLMX_PORT and LLMX_NO_AUTOSTART=1 control behavior. Replaces --http with --serve flag. Deps: add bytes + http-body-util, drop tower-service, enable mcp-http feature by default, add hyper client support.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 915cca7207
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let server = if let Some(client) = detect_or_start_backend(port).await { | ||
| tracing::info!("Running in proxy mode (backend on port {port})"); | ||
| LlmxServer::new_remote(client) | ||
| } else { |
There was a problem hiding this comment.
Keep --path auto-indexing when connecting to shared backend
When a stdio instance finds/starts the shared backend, it immediately enters proxy mode and never runs auto_index_paths(&args.paths). This breaks existing llmx-mcp --path ... workflows (including first-launch autostart), because startup paths are silently ignored and no index job is created until the user manually calls llmx_index.
Useful? React with 👍 / 👎.
| DataSource::Remote(client) => { | ||
| let result = client.search(&input).await | ||
| .map_err(|e| McpError::internal_error(e.to_string(), None))?; |
There was a problem hiding this comment.
Send client location before proxying search requests
In proxy mode the request is forwarded unchanged, so if the caller omits index_id/loc (which is explicitly allowed), index resolution happens on the backend using the backend process's current directory rather than the MCP client's working directory. With a shared backend this causes cross-project sessions to query the wrong index or fail with "No index found" unless every call provides explicit location metadata.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR introduces a shared, persistent REST backend that multiple MCP stdio sessions can proxy to, allowing them to share one in-memory index instead of loading per-session copies from disk. It also tightens dynamic search behavior so searching from a subdirectory of an indexed root scopes results via path_prefix.
Changes:
- Add a
--serve <port>REST backend mode plus stdio proxy mode with auto-start and an internalBackendClient. - Implement “ancestor root” index resolution and auto-scoped
path_prefixfiltering for persistent-index searches from subdirectories. - Update MCP tool/input types for HTTP forwarding, adjust default features/dependencies, and document new env vars.
Reviewed changes
Copilot reviewed 6 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| ingestor-core/src/mcp/tools.rs | Adds serde derives needed to serialize/deserialize tool payloads over HTTP. |
| ingestor-core/src/handlers/types.rs | Makes search filters clonable for scoped-filter composition. |
| ingestor-core/src/handlers/storage.rs | Adds ancestor-root metadata lookup for subdirectory resolution. |
| ingestor-core/src/handlers/mod.rs | Uses ancestor lookup and injects scoped path_prefix when searching from subdirectories. |
| ingestor-core/src/bin/mcp_server.rs | Implements REST backend, proxy routing via DataSource, and backend auto-start logic. |
| ingestor-core/Cargo.toml | Enables mcp-http by default and updates HTTP-related dependencies/features. |
| README.md | Documents shared backend behavior and configuration env vars. |
| Cargo.lock | Updates resolved dependencies and bumps crate version. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Combine with any existing path_prefix filter | ||
| filters.path_prefix = Some(match filters.path_prefix { | ||
| Some(existing) => format!("{}/{}", relative, existing), | ||
| None => relative.clone(), |
There was a problem hiding this comment.
When scoping a persistent index search from a subdirectory, path_prefix is set to relative without a trailing / when there is no existing prefix. Because filtering is starts_with, this can incorrectly include sibling paths like src/library/... when the intended scope is src/lib/.... Consider normalizing the scoped prefix to a directory boundary (e.g., ensure relative ends with / when it represents a directory) before combining with an existing prefix, and avoid introducing double slashes when joining.
| // Combine with any existing path_prefix filter | |
| filters.path_prefix = Some(match filters.path_prefix { | |
| Some(existing) => format!("{}/{}", relative, existing), | |
| None => relative.clone(), | |
| // Normalize the relative prefix to a directory boundary and | |
| // combine with any existing path_prefix filter without introducing | |
| // double slashes. | |
| let mut normalized_relative = relative.clone(); | |
| if !normalized_relative.is_empty() && !normalized_relative.ends_with('/') { | |
| normalized_relative.push('/'); | |
| } | |
| filters.path_prefix = Some(match filters.path_prefix { | |
| Some(existing) => { | |
| let trimmed_existing = existing.trim_start_matches('/'); | |
| if normalized_relative.is_empty() { | |
| trimmed_existing.to_string() | |
| } else { | |
| format!("{}{}", normalized_relative, trimmed_existing) | |
| } | |
| } | |
| None => normalized_relative, |
| /// Find a persistent index whose root is an ancestor of the given path. | ||
| /// | ||
| /// Returns the metadata and the relative path from the index root to the given path. | ||
| /// Prefers the deepest (most specific) ancestor match. | ||
| pub fn find_metadata_containing_path(&self, path: &Path) -> Option<(&IndexMetadata, String)> { | ||
| let normalized = path.to_string_lossy().replace('\\', "/"); | ||
| let mut best: Option<(&IndexMetadata, String)> = None; | ||
|
|
||
| for meta in self.registry.indexes.values() { | ||
| let root = meta.root_path.trim_end_matches('/'); | ||
| let prefix = format!("{}/", root); | ||
| if normalized.starts_with(&prefix) { |
There was a problem hiding this comment.
find_metadata_containing_path introduces new ancestor-matching behavior but currently has no dedicated unit tests. Since this logic affects index selection (and is easy to regress on edge cases like nested roots and boundary matching), please add tests covering: (1) deepest-match selection when multiple roots are ancestors, and (2) non-matching for prefix collisions (e.g., /proj/src should not match /proj/src2).
Ensures 'src/lib' prefix won't match 'src/library/...' by normalizing the relative path with a trailing '/'.
- fill_loc_from_cwd sets loc to proxy's cwd when unset so the backend resolves indexes against the client's directory - Forward --path args to backend via /api/index in proxy mode - Proxy read_resource to backend /api/status in Remote mode - /api/roots auto-indexes unindexed root paths on the backend
Covers: basic subdirectory match, deepest ancestor selection, prefix collision rejection (/proj/src vs /proj/src2), exact root non-match, and no-match cases.
|
@codex full code review please |
- Add spawn_backend_watcher for --serve mode: watches indexed roots for changes and refreshes indexes automatically, with periodic re-scan for newly indexed roots - Forward --storage-dir to auto-started backend process so proxy sessions don't silently use the wrong storage location - Canonicalize relative paths in llmx_index proxy mode against client cwd before forwarding to backend
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: deac0e515f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if let DataSource::Remote(client) = &self.data_source { | ||
| let result = client.index(&input).await | ||
| .map_err(|e| McpError::internal_error(e.to_string(), None))?; |
There was a problem hiding this comment.
Resolve llmx_index paths before forwarding to backend
In proxy mode this branch forwards IndexInput as-is to the backend, but run_index_work resolves relative paths against the backend process working directory, not the caller’s working directory. That means requests like {"paths":["."]} or "src" from a second MCP session can index the wrong project (typically the first session that started the backend), which is a regression from local mode semantics. Normalize paths on the proxy side (or send caller cwd explicitly) before calling client.index.
Useful? React with 👍 / 👎.
| cmd.arg("--serve") | ||
| .arg(port.to_string()) | ||
| .stdin(std::process::Stdio::null()) |
There was a problem hiding this comment.
Honor --storage-dir when auto-starting shared backend
Auto-start always spawns llmx-mcp --serve <port> without forwarding --storage-dir, so users who start stdio with a custom storage directory silently get a backend using default storage. In that scenario indexes are written/read from the wrong location, which can mix data between projects and return unexpected search results. The spawn command should propagate the configured storage directory.
Useful? React with 👍 / 👎.
| let DataSource::Local { store, .. } = &self.data_source else { | ||
| return Ok(()); // No-op in proxy mode | ||
| }; |
There was a problem hiding this comment.
Keep root watchers active in proxy mode
This early return disables watcher setup whenever the server is in Remote mode. Since this commit makes proxy mode the common path, file edits no longer trigger refresh_impacted_indexes, so status/search can stay stale until someone manually re-runs indexing. Previously stdio sessions refreshed impacted indexes on filesystem events; proxy mode needs an equivalent path instead of no-oping watcher registration.
Useful? React with 👍 / 👎.
Replace per-event watcher callbacks in both stdio and --serve modes with a single spawn_debounced_watcher that collects changed paths into an mpsc channel and drains after a 500ms quiet window before calling refresh_impacted_indexes once. Stdio mode passes a PostRefreshFn callback for MCP peer notifications. Backend mode passes None. Both share the same debounce + batch refresh logic through one notify watcher creation point.
- Add /api/config endpoint returning backend's storage_dir - detect_or_start_backend checks storage-dir compatibility before attaching; falls back to standalone on mismatch - /api/roots skips auto-indexing paths already covered by an ancestor index (has_ancestor_index) to avoid duplicate nested indexes that defeat subdirectory scoping
Proxy mode cannot fire tools/list_changed or resources/ list_changed notifications since the backend watcher runs without MCP peers. Avoid misleading clients by only advertising these capabilities when running with a local DataSource and an active file watcher.
Backend generates a random token on startup and writes it to
{storage_dir}/.backend-token with 0600 permissions. All API
requests require Authorization: Bearer header -- returns 401
without it.
Clients read the token from the file before connecting. The
auto-start wait loop re-reads the token after spawning since
the backend writes it during its startup. If a rogue process
holds the port, the auth check fails and the client falls back
to standalone mode instead of leaking code to it.
Previously a config check failure was treated as compatible
(older backend). Now it's fail-closed: any auth or config
failure rejects the backend.
Security (must-fix): - Token generation: replace RandomState hasher with getrandom CSPRNG (32 bytes of OS entropy via hex crate) - Token comparison: constant-time XOR fold prevents timing side-channel attacks on bearer token validation - Token file: atomic write via temp file with 0600 mode on creation (OpenOptionsExt), then rename, eliminating the TOCTOU window where file was briefly world-readable Code quality (should-fix): - Replace unreachable!() in llmx_index with proper McpError return to avoid panicking in async context - Lock store mutex once in /api/roots instead of per-path inside filter closure - Add retry limit (10) with counter to backend watcher loop so it stops instead of spinning forever on persistent failure - Store consumer JoinHandle in DebouncedWatcher for task health observability
|
@codex security review |
Shared REST Backend — Multi-Session Index Sharing
Multiple MCP clients (Claude Code, Cursor, Codex) now share a single in-memory index instead of each loading its own copy from disk.
Important
This is a zero-config upgrade. Existing setups work unchanged — the first
llmx-mcpsession auto-starts a shared backend onlocalhost:19100, and subsequent sessions connect as lightweight proxies.How It Works
graph LR A["Claude Code<br/><small>stdio</small>"] -->|proxy| D["llmx-mcp --serve 19100<br/><small>shared backend</small>"] B["Cursor<br/><small>stdio</small>"] -->|proxy| D C["Codex<br/><small>stdio</small>"] -->|proxy| D D --> E[("~/.local/share/llmx/indexes")]Architecture detail
A new
DataSourceenum routes every MCP tool call to either a Local store (standalone mode) or a RemoteBackendClient(proxy mode). The backend exposes a simple REST API (/api/status,/api/search,/api/index, …) over HTTP/1.1 on the loopback interface.On startup, each stdio session:
llmx-mcp --serve <port>as a detached processWhat Changed
--serve <port>flag runs a persistent HTTP backend. All tool endpoints are mirrored as REST routes. Replaces the previous--httpflag.BackendClientforwards tool calls over HTTP. Client roots are relayed via/api/roots./project/src/libwhen/projectis the indexed root now automatically scopes results viapath_prefixfiltering. No more unrelated results from sibling directories.LLMX_PORT19100)LLMX_NO_AUTOSTART=1Dependency Changes
bytestower-servicehttp-body-utilThe
mcp-httpfeature is now enabled by default.hypergainsclientsupport for the proxy’s outbound requests.Note
The
rmcpcrate also sheds several transitive dependencies (sse-stream,uuid,rand,tokio-stream) as part of an upstream cleanup reflected in the lockfile.