Skip to content

Latest commit

 

History

History
569 lines (392 loc) · 27.7 KB

File metadata and controls

569 lines (392 loc) · 27.7 KB

Todo

This file tracks the active execution queue for this repository. Keep it current when starting, finishing, or reprioritizing work.

P0: Documentation And Policy Cleanup

  • Move root research source notes into docs/research/source-notes/.
  • Add a documentation map in docs/README.md.
  • Split cross-project memory rules into docs/memory-policy.md.
  • Split repository-specific strategy into docs/project-development-policy.md.
  • Move detailed MCP API reference out of README.md.
  • Add AGENTS.md as a short agent entrypoint.

P1: Query Normalization

  • Add deterministic query normalization for domain terms.
  • Make 待办, 待办项, todo, task, and 任务 retrieve work_item records.
  • Make decision/preference/procedure/evidence terms retrieve the matching object types or knowledge kinds.
  • Return normalized_terms, applied_filters, and useful no-match retry hints where practical.
  • Add tests for search and context behavior.
  • Update docs/mcp-api-reference.md and docs/agent-memory-mcp-usage.md after behavior lands.

P1: Unstructured Soft Duplicate Candidates

  • Add soft duplicate detection for title/summary-only knowledge.
  • Return possible_duplicates with reasons and scores.
  • Do not hard-reject unstructured semantic duplicates.
  • Preserve current hard duplicate/conflict behavior for structured facts.
  • Add tests for similar unstructured knowledge that should be flagged but still writable.

P1: Maintain Duplicate Review

  • Extend memory_maintain report to surface soft duplicate candidates.
  • Keep merge_duplicates limited to deterministic structured duplicates until review semantics are explicit.
  • Design a safe resolve/review path for soft duplicates.

P2: Agent Query Planning Guidance

  • Update agent usage docs so callers retry with expanded terms before concluding there is no memory.
  • Add response guidance fields to MCP docs after implementation.
  • Add MCP resources/prompts for policy and examples.

P3: Semantic Retrieval And Reasoner Adapter Evaluation

  • Re-evaluate embedding/vector/hybrid retrieval after deterministic query normalization is in place.
  • Continue Cognee and LlamaIndex spikes only if they fit behind Memory Substrate governance.
  • Keep hosted LLMs, local LLMs, Graphiti, and reasoner adapters optional.
  • Treat Neo4j as an optional production backend after local contracts and migrations are stable.

P1: LanceDB Semantic Retrieval

  • Spike LanceDB + BGE-M3 semantic retrieval against Chinese/English memory queries.
  • Confirm LanceDB remains a derived index, not canonical storage.
  • Add optional semantic dependencies for LanceDB and FlagEmbedding.
  • Project canonical memory objects into semantic chunks.
  • Rebuild the semantic index from memory_maintain reindex.
  • Merge lexical and semantic results in memory_query search.
  • Keep semantic search active when a graph backend is also configured.
  • Add regression coverage for the Codex dogfood MCP query miss.

P1: Repo Parser And Documentation Indexing

  • Adopt a single-primary parser stance: require tree_sitter_language_pack, then use local fallback parsing only when parser loading fails.
  • Index Markdown repository docs as source evidence with headings, excerpts, and line locators.
  • Make repo query summaries include documentation sections so theory-to-code questions can find design docs.
  • Add a locked parser dependency for tree-sitter-language-pack==1.6.0 and run a live parser smoke test.

P1: MCP Context Budget

  • Make memory_query page compact by default with explicit options.detail: "full" for complete stored objects.
  • Bound and truncate source segment excerpts returned by memory_query expand.
  • Shorten repo source summaries and MCP server instructions.

Active Execution Queue

MS-01: Retrieval Fusion And Query Matching

Status: completed

Goal: make memory_query search robust when users and agents do not phrase queries with exact stored keywords.

Boundary: memory-core retrieval only. Do not adopt llm_wiki desktop/wiki UI, web clipper, or knowledge-collection workflows.

Deliverables:

  • Replace lexical/semantic score max-merge with rank-based fusion such as Reciprocal Rank Fusion.
  • Improve lexical query planning with phrase, title, filename/id, and CJK bigram signals.
  • Return matched semantic chunks with source locators, excerpts, and chunk scores in query results.
  • Document retrieval scoring behavior in MCP docs.

Verification:

  • Add focused unit tests for lexical phrase/title/id/CJK matching.
  • Add semantic merge tests using a fake semantic index.
  • Run uv run --group dev python -m pytest tests/test_query_normalization.py tests/test_semantic_index_service.py tests/test_mcp_server.py.
  • Run non-semantic main path: uv run --group dev python -m pytest -k 'not lance and not semantic'.

MS-02: Source Chunking And Evidence Quality

Status: completed

Goal: make source segments and semantic chunks preserve document structure and citeable locations.

Boundary: deterministic source/evidence preparation. Do not introduce mandatory LLM extraction or hosted services.

Deliverables:

  • Add a Markdown-aware document chunker for ingest and semantic indexing.
  • Preserve heading breadcrumbs, code fences, tables, frontmatter boundaries, overlap, and source offsets in chunks.
  • Reuse one chunking contract across memory_ingest, source segments, and semantic rebuild.
  • Include source locators and heading breadcrumbs in semantic chunks.

Verification:

  • Add tests for CJK text, code blocks, markdown tables, YAML frontmatter, and oversized sections.
  • Add ingest tests proving source segments carry stable locators and hashes.
  • Run uv run --group dev python -m pytest tests/test_phase1_acceptance.py tests/test_semantic_index_service.py.

MS-03: Source Robustness From llm_wiki Upstream

Status: completed

Goal: absorb upstream llm_wiki source-hardening lessons without shifting Memory Substrate into a desktop knowledge collector.

Boundary: adapters and projections only. Canonical memory objects remain independent of document extraction libraries.

Deliverables:

  • Add robust frontmatter parsing/sanitizing for LLM-generated or imported markdown projections.
  • Evaluate PDF/DOCX/XLSX extraction dependencies for source capture; keep them behind ingest adapters and not core storage.
  • Treat multimodal image extraction/captioning as optional evidence capture for document-heavy knowledge work, not a memory-core prerequisite.
  • Add source deletion/cascade cleanup semantics only after source manifests and provenance policies are explicit.

Verification:

  • Add projection tests for fenced YAML, misplaced frontmatter:, wikilink lists, and malformed frontmatter fallback.
  • Add dependency decision notes before adding document extraction packages.
  • Run uv run --group dev python -m pytest tests/test_obsidian_projection.py tests/test_structure_validation.py.

MS-04: Graph Maintenance Insights

Status: completed

Goal: make memory_maintain report surface graph-health issues that agents can act on.

Boundary: maintain/report output first. Defer visualization and UI concerns to product layers.

Deliverables:

  • Add deterministic graph health insights to memory_maintain report: isolated nodes, sparse clusters, bridge nodes, and weakly connected scopes.
  • Evaluate a local Python graph analysis library, such as networkx, before considering UI-oriented graphology/sigma dependencies.
  • Keep graph insights as maintain/report output for agents first; defer visualization to a separate product layer.

Verification:

  • Add graph-health report tests with a small synthetic memory graph.
  • Run uv run --group dev python -m pytest tests/test_maintain_service.py tests/test_graph_health_report.py.

MS-05: MCP Query Sanitizer And Diagnostics

Status: completed

Goal: prevent long agent prompts, system instructions, and scratchpads from polluting memory_query retrieval.

Boundary: query-service hardening only. Do not add LLM query rewriting or a new retrieval library for this slice.

Deliverables:

  • Review MemPalace design lessons and capture them in docs/research/2026-04-30-mempalace-design-review.md.
  • Sanitize long memory_query search text before query planning.
  • Sanitize long memory_query context task text before context building.
  • Return query_sanitizer diagnostics and warnings when sanitization occurs.
  • Update MCP usage and API docs.

Verification:

  • Add focused tests for labeled long-prompt sanitization in search and context.
  • Run uv run --group dev python -m pytest tests/test_query_normalization.py::QueryNormalizationTest::test_search_sanitizes_long_agent_prompt_before_planning_terms tests/test_query_normalization.py::QueryNormalizationTest::test_context_sanitizes_long_agent_prompt_and_reports_diagnostics.

MS-06: Source Adapter Metadata Contract

Status: completed

Goal: make memory_ingest outputs self-describing across repos, markdown, conversations, and future source adapters.

Boundary: adapter metadata and source payloads only. Do not add heavy document extraction dependencies in this slice.

Deliverables:

  • Define adapter metadata fields: adapter name, adapter version, supported mode, declared transformations, privacy class, and origin classification.
  • Attach adapter metadata to repo and markdown ingested sources.
  • Add deterministic freshness/currentness hints where available.
  • Update docs/mcp-api-reference.md and docs/agent-memory-mcp-usage.md.

Verification:

  • Add source ingest tests for repo and markdown adapter metadata.
  • Run focused source metadata tests in tests/test_phase1_acceptance.py.

MS-07: Tiered Context Pack Contract

Status: completed

Goal: evolve memory_query context into budgeted work-ready context instead of a flat item list.

Boundary: context pack contract and query output shape. Do not add UI or visualization.

Deliverables:

  • Define context tiers for policy, active task, decisions, procedures, evidence, open work, and deep-search hints.
  • Keep compact defaults and bounded excerpts.
  • Preserve existing fields during the transition where practical.
  • Update MCP resources so an agent with no repo context can still use the tiers correctly.

Verification:

  • Add context budget and tier-order tests.
  • Run focused context pack contract tests.

MS-08: Derived Index Repair And Retrieval Benchmark Harness

Status: completed

Goal: make semantic and graph indexes auditable, rebuildable, and measurable.

Boundary: local diagnostics and small deterministic benchmark data. Do not introduce hosted services.

Deliverables:

  • Add derived-index repair checks that compare index counts against canonical objects before destructive rebuilds.
  • Add planted-needle retrieval benchmark cases for lexical, semantic, and hybrid retrieval.
  • Report recall and latency separately per retrieval stream.
  • Document when to run benchmarks and how to interpret regressions.

Verification:

  • Add repair-safety tests for missing or stale semantic index entries.
  • Add a small benchmark smoke test that runs without network access.

MS-09: Memory Fact-Checker And Lifecycle Lint

Status: completed

Goal: surface entity confusion, stale facts, and relationship mismatches without automatic mutation.

Boundary: advisory memory_maintain report output only. Do not auto-contest, supersede, or merge facts.

Deliverables:

  • Report similar entity names that may cause incorrect recall.
  • Report stale active facts using valid_until, last_verified_at, status, and evidence age where available.
  • Report relationship mismatches for structured claims with clear subject/predicate/object conflicts.
  • Add next-action guidance for promote, contest, supersede, or keep-both review.

Verification:

  • Add maintain report tests with synthetic entity-confusion and stale-fact fixtures.
  • Run focused maintain report fact-check test.

MS-10: Context Payload Compression

Status: completed

Goal: reduce memory_query context response size so MCP callers spend less context on duplicated section data.

Boundary: response shape and documentation only. Do not remove compact item details or require an LLM summarizer.

Deliverables:

  • Measure context response field sizes and identify duplicated section payloads.
  • Convert context_tiers from copied section lists into compact directory metadata.
  • Convert top-level decisions, procedures, and open_work into id directories back into items.
  • Clip context item summaries to keep default context compact.
  • Update MCP docs and agent resources.

Verification:

  • Add regression coverage that context tiers do not duplicate section summaries.
  • Add payload budget coverage for large context responses.
  • Measure sample context payload reduction from about 16.2 KB to about 7.2 KB.

MS-11: Advisory Concept Candidate Discovery

Status: completed

Goal: reconnect the LLM Wiki crystallization loop by surfacing repeated source concepts without automatically mutating durable memory.

Boundary: deterministic advisory discovery only. Do not add a required LLM API key and do not auto-promote candidates into canonical memory.

Deliverables:

  • Add reusable concept candidate discovery over source segments, headings, and existing memory text.
  • Surface global concept_candidates from memory_maintain report.
  • Surface current-source memory_suggestions.concept_candidates from memory_ingest.
  • Suppress candidates already represented by concept knowledge or concept nodes.
  • Document that candidates require agent/human review before memory_remember.

Verification:

  • Add maintain report tests for repeated uncrystallized concepts and existing concept suppression.
  • Add repo ingest test proving source-local advisory concept candidates are returned.
  • Run focused red-green tests for the new behavior.

MS-12: Candidate Review And Crystallization Flow

Status: completed

Goal: make advisory candidates actionable for agents without letting candidates become automatic canonical memory.

Boundary: response guidance and agent workflow only. Do not add an automatic write path, mandatory LLM key, or background agent.

Deliverables:

  • Add review_guidance outcomes for concept, procedure, decision, merge, and skip.
  • Add suggested_memory.input_data with reason, memory source, scope refs, evidence refs, status, confidence, and editable fields.
  • Infer candidate scope refs from repo/document nodes when available and fall back to source ids.
  • Document the candidate review flow in MCP docs, agent resources, and memory policy.
  • Dogfood candidate discovery on wiki-memory, llm_wiki, and mempalace using a temporary memory root.

Verification:

  • Add regression coverage for executable candidate review payloads.
  • Run focused red-green tests for candidate review payloads.

MS-13: Candidate Quality And Ranking

Status: completed

Goal: make candidate discovery more stable and useful by classifying, ranking, and diagnosing candidate quality.

Boundary: deterministic candidate quality only. Do not add a required LLM classifier or automatic durable writes.

Deliverables:

  • Add candidate_type hints for concept, procedure, decision, tool/library, and implementation detail candidates.
  • Add ranking_signals with score bonuses and penalties.
  • Rank stable concepts/procedures/decisions ahead of tool/library and version/package details.
  • Add candidate_diagnostics.skipped so filtered phrases are explainable.
  • Update MCP docs and agent resources.

Verification:

  • Add tests for classification, ranking, diagnostics, and ingest response shape.
  • Dogfood against wiki-memory, llm_wiki, and mempalace.

MS-14: Soft Duplicate Review Resolve

Status: completed

Goal: turn advisory soft duplicate candidates into an explicit reviewed maintenance workflow.

Boundary: explicit review outcomes only. Do not let merge_duplicates auto-merge unstructured soft duplicates.

Deliverables:

  • Add memory_maintain resolve_duplicates for reviewed soft duplicate candidates.
  • Support supersede, keep_both, and contest outcomes.
  • Require non-empty review reasons and current soft duplicate candidate ids.
  • Keep curated replacement as an explicit memory_remember knowledge write followed by memory_remember supersede.
  • Update MCP docs, agent usage docs, and built-in resources.

Verification:

  • Add lifecycle tests for supersede, keep_both, and rejecting non-candidate pairs.
  • Add MCP dispatch/schema/apply guard tests.

MS-15: Soft Duplicate Report Guidance

Status: completed

Goal: make soft duplicate report entries self-guiding so agents can safely review and resolve them.

Boundary: response guidance only. Do not auto-resolve duplicates and do not add an LLM reviewer.

Deliverables:

  • Add review_guidance to each soft duplicate candidate.
  • Add editable suggested_resolution payloads for memory_maintain resolve_duplicates.
  • Add next_actions for review, outcome selection, and explicit resolution.
  • Update MCP docs, agent usage docs, and built-in resources.

Verification:

  • Add maintain report coverage for guidance and suggested resolution payloads.

MS-16: Source Archive And Cascade Diagnostics

Status: completed

Goal: retire bad or untrusted sources without deleting canonical history, while making affected knowledge explicit.

Boundary: safe archive semantics only. Do not physically delete sources or automatically downgrade mixed-evidence knowledge.

Deliverables:

  • Add memory_maintain archive_source with required source_id, reason, and options.apply=true.
  • Mark the source archived with an audit reason.
  • Mark knowledge stale only when all evidence refs depend on the archived source.
  • Return partially_affected_knowledge_ids for mixed-evidence knowledge requiring review.
  • Update MCP docs, agent usage docs, policy, and built-in resources.

Verification:

  • Add lifecycle coverage for archive and cascade behavior.
  • Add MCP dispatch/schema/apply guard coverage.

MS-17: Relation Graph Schema Hardening

Status: completed

Goal: make graph relation edges explainable and backend-independent without promoting derived graph data to canonical storage.

Boundary: relation provenance contract only. Do not add graph-table migration complexity or make Kuzu the canonical store.

Deliverables:

  • Add payload.relation_schema.version to synced graph relations.
  • Record derivation kind: canonical relation, field reference, evidence ref, or structured payload.
  • Record origin object type/id, origin field, and endpoint canonical object types.
  • Preserve existing relation payload fields such as knowledge_id.
  • Document relation provenance for agents and MCP callers.

Verification:

  • Add graph sync coverage for relation provenance schema.
  • Run graph sync regression tests.

MS-18: Agent-Assisted Extraction Protocol

Status: completed

Goal: make ingest return an explicit handoff from captured evidence to agent-reviewed durable memory writes.

Boundary: protocol metadata only. Do not add a mandatory LLM dependency or let ingest decide what should be remembered.

Deliverables:

  • Add memory_suggestions.agent_extraction with protocol version and source id.
  • Document the boundary between ingest, agent analysis, and governed remember writes.
  • Include required steps for source inspection, existing-memory query, candidate preparation, and reviewed remember.
  • Include a remember_write_contract with required and recommended fields.
  • Update API docs, agent usage docs, policy, and MCP resources.

Verification:

  • Add acceptance coverage for the extraction protocol shape.
  • Run candidate suggestion regression coverage.

MS-19: Maintenance Dogfood Benchmark

Status: completed

Goal: provide a deterministic local benchmark for long-term maintenance signals.

Boundary: local read-only report benchmark only. Do not require network, optional embedding models, or real user memory roots.

Deliverables:

  • Add run_maintenance_dogfood_benchmark under packaged experiment helpers.
  • Seed synthetic cases for promotable candidates, low-evidence candidates, stale candidates, structured duplicate groups, and soft duplicate candidates.
  • Return expected counts, observed counts, per-case checks, reference time, and mutation flag.
  • Document the benchmark entrypoint in experiments/README.md.

Verification:

  • Add benchmark test coverage.
  • Run retrieval and maintenance benchmark tests.

MS-20: Context Budget Optimization

Status: completed

Goal: reduce accidental context consumption from common MCP calls while keeping explicit expansion paths available.

Boundary: response-size controls only. Do not remove compact repo indexes, evidence locators, or explicit caller overrides for bounded non-repo objects.

Deliverables:

  • Return explicit page_unavailable / unsupported for repo source memory_query page detail=full.
  • Keep full detail available for bounded non-repo objects.
  • Compress memory_suggestions.agent_extraction into a compact protocol with a resource pointer.
  • Compress ingest memory_suggestions.concept_candidates into compact triage records and keep full write skeletons in memory_maintain report.
  • Lower MCP default search, recent, and graph max_items from 20 to 10 while preserving explicit overrides.
  • Update API docs, agent usage docs, policy, and MCP resources.

Verification:

  • Add regression coverage for repo full-page unsupported semantics.
  • Add regression coverage for compact ingest concept candidates.
  • Add regression coverage for compact extraction protocol.
  • Add MCP dispatch coverage for compact defaults and explicit max_items.

MS-21: End-To-End Dogfood Acceptance

Status: completed

Goal: provide a deterministic local acceptance signal for the core memory loop across ingest, query, remember, maintain, reindex, and context retrieval.

Boundary: MCP dispatch workflow only. Do not require network access, optional embedding models, hosted services, or a real user memory root.

Deliverables:

  • Add run_end_to_end_dogfood_acceptance under packaged experiment helpers.
  • Seed a small repo that produces a compact concept candidate.
  • Exercise memory_ingest, memory_query search, memory_query page, memory_remember knowledge, memory_maintain report, memory_maintain reindex, and memory_query context.
  • Return per-step checks, object ids, observed ids, and compact payload sizes.
  • Document the benchmark entrypoint in experiments/README.md.

Verification:

  • Add benchmark coverage for the end-to-end MCP memory loop.
  • Run retrieval benchmark tests and full test suite.

MS-22: Dogfood Findings Hardening

Status: completed

Goal: make the end-to-end dogfood helper actionable when it fails and enforce compact response budgets as part of acceptance.

Boundary: dogfood helper diagnostics only. Do not add new production MCP modes or optional backends.

Deliverables:

  • Add failed_checks and diagnostic next_actions to the dogfood acceptance result.
  • Add explicit payload_budgets for compact candidates and context payloads.
  • Promote compact candidate and context payload budgets into acceptance checks.
  • Document the diagnostic fields in experiments/README.md.

Verification:

  • Add benchmark assertions for failed check summaries, next actions, and payload budgets.
  • Run retrieval benchmark tests and full test suite.

MS-23: Dogfood Repeatability

Status: completed

Goal: make the end-to-end dogfood helper safe to run repeatedly under the same local parent directory.

Boundary: experiment helper isolation only. Do not change production MCP storage roots or canonical object identity semantics.

Deliverables:

  • Create an isolated dogfood-runs/run-NNNN directory for each helper invocation.
  • Return run_root in the dogfood acceptance result for diagnostics.
  • Document repeatable run behavior in experiments/README.md.

Verification:

  • Add coverage that the helper can run twice in the same parent root.

MS-24: Real MCP Host Smoke And Release Readiness

Status: completed

Goal: verify the package works from a real MCP stdio host shape and keep the release path testable without adopting Neo4j, Graphiti, or UI work.

Boundary: host smoke, release docs, and local verification only. Do not add new production backends, UI surfaces, hosted LLM providers, or mandatory optional dependencies.

Deliverables:

  • Add a packaged MCP host smoke helper that starts the server over stdio, initializes an MCP client session, lists tools/resources, and calls representative tools without passing a root in tool args.
  • Verify the smoke helper binds MEMORY_SUBSTRATE_ROOT at process startup and mutates only the supplied temporary/local root.
  • Document when to run the host smoke and release checks.
  • Keep README navigational and avoid duplicating the full MCP API reference.

Verification:

  • Add automated coverage for the host smoke helper.
  • Run focused MCP host smoke tests.
  • Run the full test suite.
  • Run uv build.

MS-25: External Wiki Projection Render And Reconcile

Status: pending

Goal: add a low-frequency maintenance flow for using one configured external wiki, such as an Obsidian vault folder, without making wiki files canonical memory.

Boundary: projection maintenance only. Do not add top-level MCP tools, do not restore wiki-first storage, do not make an internal wiki projection a required path, and do not automatically write wiki edits back into canonical memory.

Design decisions:

  • Use existing memory_maintain modes rather than new top-level tools.
  • Store configuration in the memory root at memory/config.json.
  • Support one configured wiki projection target with path and format.
  • Treat the external wiki as a projection target, not canonical storage.
  • Protect user notes with a projection manifest so render only manages files previously generated by Memory Substrate.
  • Default reconciliation to report-only output with candidates and conflicts; canonical writes still go through reviewed memory_remember or explicit apply modes.

Planned modes:

  • Extend memory_maintain configure with wiki_projection.path and wiki_projection.format.
  • Add memory_maintain render_projection for canonical memory -> configured external wiki.
  • Add memory_maintain reconcile_projection for configured external wiki -> diff report, conflicts, and remember candidates.

Verification:

  • Add config repository tests for wiki projection settings.
  • Add render tests proving generated files are manifest-bound and canonical objects are unchanged.
  • Add reconcile tests proving report-only behavior does not mutate canonical memory.
  • Update MCP API docs and agent usage docs.

MS-26: MCP Tool Discovery Metadata

Status: completed

Goal: make deferred MCP hosts more likely to discover Memory Substrate as the persistent agent memory server before agents fall back to shell diagnostics.

Boundary: server/tool descriptions, MCP resources, and docs only. Do not change tool count, mode schemas, storage behavior, or Codex host configuration.

Deliverables:

  • Add discovery keywords to server instructions and tool descriptions.
  • Add a tool-discovery rule to the built-in agent playbook.
  • Document that hosts may defer MCP tools and agents should search for memory-substrate before using shell fallbacks.

Verification:

  • Add MCP server/resource tests for discovery text.
  • Run focused MCP server tests.
  • Run full tests.