feat: blended search and indexing across multiple scopes (closes #337)#525
Open
1TommyCheung wants to merge 16 commits intozilliztech:mainfrom
Open
feat: blended search and indexing across multiple scopes (closes #337)#5251TommyCheung wants to merge 16 commits intozilliztech:mainfrom
1TommyCheung wants to merge 16 commits intozilliztech:mainfrom
Conversation
Add pointer comments above MemSearchConfig and _SECTION_CLASSES to clarify that ScopeConfig/DefaultScopeConfig are intentionally unwired and will be integrated in Task 2 of the multi-scope plan. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add frozen `Scope` dataclass with name, collection, paths, quota, uri, and token fields — first building block for multi-scope blended search.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add `default_scope_name`, `default_scope_quota`, and `extra_scopes` kwargs to `MemSearch.__init__`; build `self._stores: dict[str, MilvusStore]` with one entry per scope; keep `self._store` as a back-compat alias pointing at the default scope's store. Update `close()` to iterate all stores, with a `__new__`-safe fallback for test fixtures that bypass `__init__`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace MemSearch.search body with single-scope fast path (no scope tag, backwards-compatible) and multi-scope path using asyncio.gather fan-out, _blend_scope_results dedup+quota logic, and only_scope restriction with ValueError on unknown names. Add _seed_scope helper, two_scope_mem fixture, and four integration tests covering: no-scope-field on single-scope, scope tagging on multi-scope, only_scope restriction, and ValueError on unknown scope names. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MemSearch.index() now builds a plan from the default scope's _paths plus any extra_scopes with non-empty paths. Each file is indexed into the per-scope store via _index_file(scope_name=…). Read-only scopes (empty paths) are skipped entirely. _embed_and_store() also accepts an optional scope_name so it writes to the correct store. Backward-compat is preserved: objects constructed via __new__ without _default_scope_name / _stores fall back to the old _store attr; when scope_name is None the helpers use self._store as before. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add _resolve_scope_for_path() (longest-prefix match across all scopes) and index_file_for_scope() (scope-aware single-file indexer); update watch() to build a unified path list and route _on_change to the correct store via the resolver instead of hardcoding the default scope. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add _parse_extra_scope helper, two new Click options on the search command, and wire extra_scopes/only_scope through to MemSearch.search(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ings Three scenario-driven workflows that exercise multi-scope routing end-to-end without requiring any API key (uses the ONNX local embedding provider): 1. Solo dev (closes zilliztech#337): project + global personal scopes, blended retrieval with quota enforcement and only_scope restriction. 2. Chat agents shared memory: a "registrar" indexes shared canon once; multiple agents (Alice, Bob) attach to it as a read-only scope (empty paths) while each writes to their own private scope. Verifies cross-agent privacy. 3. Individual isolation: two independent MemSearch instances on separate Milvus DBs cannot cross-leak. Single-scope behavior unchanged. Run via: uv run python scripts/scenario_validation.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #337. Builds on #408 and #514, which enabled scope-switching via
MEMSEARCH_DIRat the storage layer (one scope at a time). This PR adds first-class multi-scope support: onememsearch searchcall fans out across N configured scopes (dedupes bychunk_hash, applies per-scope quotas, tags each result with its source scope);memsearch indexroutes files to the right collection by path-prefix match.Example uses
1. Solo developer — project history + cross-project personal preferences
The motivating use case from #337. Project notes stay isolated per repo; personal style and habit notes follow you everywhere.
Open a different project and
memsearch search "python style"still surfaces your personal preferences alongside whatever the new project knows — no env var swapping, no session restart.2. Team subscribing to a shared knowledge base (read-only scope)
A team has a curated knowledge collection (architecture docs, runbooks) indexed once by a CI job. Individual contributors attach to it as a read-only scope — they search it but never index into it.
When the contributor runs
memsearch index, only their own files are written. When they runmemsearch search, results blend their project + team-knowledge with quotas.3. Multi-agent system — shared facts + per-agent private context
A system runs N agents, each with its own private memory but all reading from a shared "facts" collection populated by an upstream registrar.
Agents read shared facts AND their own private notes in one search call. Agents cannot see each other's private collections — privacy is preserved by the per-collection isolation Milvus already provides; this PR just adds the orchestration layer above it.
4. Mixed local + cloud Milvus
Per-scope
uri/tokenlets a personal Milvus Lite live alongside a cloud-hosted team scope without proxying or duplicating storage:Backward compatibility
Single-scope behavior is byte-identical to today. No
[default_scope]or[[scopes]]config and noextra_scopes=kwarg → identical output, no new fields on results, no behavior changes anywhere. The new APIs are all opt-in.Surface
Python:
MemSearch(extra_scopes=[Scope(...)])plus optionaldefault_scope_nameanddefault_scope_quotakwargs. Search addsonly_scope=[...]for ad-hoc query restriction.TOML: new
[default_scope]table and repeatable[[scopes]]array-of-tables with fieldsname,collection,paths,quota,uri,token. Existingenv:VARinterpolation works inside scope entries.CLI:
Generality
paths) and read-only (nopaths) scopes.Test coverage
32 new tests, baseline 113 → 136 passed + 15 skipped (skips are existing
OPENAI_API_KEY-gated integration tests, unchanged):tests/test_core_unit.py(new) —Scopedataclass + the dedup+quota algorithm covering all 3 quota modes, tie-break, underfill (pure unit, no Milvus, no API key — runs in CI on every push).tests/test_core.py— multi-scope search,only_scope, index path routing, read-only-scope skip.tests/test_watcher_multi_scope.py(new) — longest-prefix scope resolution + watcher event routing.tests/test_config.py—[[scopes]]round-trip, path-overlap.tests/test_cli_*.py— flag parsing, help text, scope-tagged output.Scenario validation
scripts/scenario_validation.py(new) runs three real-world workflows end-to-end with the ONNX local embedder (no API key, no cost):All three pass.
Untouched code
store.py,chunker.py,embeddings/,scanner.py,watcher.py(the file watcher itself is generic — only the dispatch incore.pychanged),compact.py,reranker.py. No plugin hooks touched — Claude Code, Codex, OpenClaw, and OpenCode plugins continue to work unchanged. They can opt into multi-scope in follow-up PRs.Out of scope (separate PRs welcomed)
compact