feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard by kovtcharov · Pull Request #606 · amd/gaia

kovtcharov · 2026-03-20T08:43:53Z

Summary

Comprehensive agent memory system that serves as a second brain — storing, recalling, and learning from every interaction. Built on proven patterns from Mem0, Zep, and Hindsight.

Architecture (v2)

Hybrid search: Vector (FAISS) + BM25 (FTS5) + RRF fusion + cross-encoder reranking (ms-marco-MiniLM-L-6-v2)
Mem0-style extraction: LLM decides ADD/UPDATE/DELETE/NOOP against existing memory after each conversation turn
Zep-inspired fact lineage: superseded_by column preserves history when facts are corrected
Hindsight-inspired reconciliation: Background pairwise similarity check detects contradictions across sessions
Complexity-aware recall: Adaptive top_k (3/5/10) based on query complexity heuristics
Temporal search: time_from/time_to on all search methods for time-based recall
Conversation consolidation: Auto-distill old sessions into durable knowledge before 90-day prune
No silent fallback: Embeddings are a hard requirement — system fails loudly on misconfiguration

Schema v2

Three tables (knowledge, conversations, tool_history) with new columns:

knowledge.embedding BLOB — 768-dim vector (nomic-embed-text-v2)
knowledge.superseded_by TEXT — fact lineage chain
conversations.consolidated_at TEXT — consolidation tracking

Memory Tools (5 LLM-facing tools)

Tool	Purpose
`remember`	Store facts, notes, reminders with category/domain/entity/due_at
`recall`	Hybrid semantic+keyword search with temporal filtering
`update_memory`	Modify existing items, set reminded_at
`forget`	Delete a memory item
`search_past_conversations`	Search conversation history with temporal filtering

Use Cases

Note-taking, journaling, meeting notes capture
Reminders with due dates and wake-up scheduling
Personal knowledge management (research, articles)
Contact profiles via entity linking (person:sarah_chen)
Error learning and skill capture from tool usage
Recurring commitments (LLM advances due_at)

Observability Dashboard

Full-page Memory Dashboard in Agent UI with:

Header stats cards (memories, sessions, tool calls, success rate)
Activity timeline (30-day heatmap)
Knowledge browser (filterable, sortable, paginated table)
Tool performance stats
Conversation history browser with consolidation status
Upcoming/overdue reminders panel
Maintenance actions (consolidate, rebuild embeddings, reconcile)
Embedding coverage indicator

Startup Sequence

Validate Lemonade → 2. Backfill embeddings → 3. Rebuild FAISS → 4. Confidence decay → 5. Reconcile memory → 6. Consolidate sessions → 7. Prune → 8. Generate session

Files

Component	Files
Data layer	`src/gaia/agents/base/memory_store.py`
Agent mixin	`src/gaia/agents/base/memory.py`
System discovery	`src/gaia/agents/base/discovery.py`
REST API	`src/gaia/ui/routers/memory.py`
Agent UI	`src/gaia/apps/webui/src/pages/MemoryDashboard.tsx`
Architecture spec	`docs/spec/agent-memory-architecture.md`
Unit tests	`tests/unit/test_memory_*.py`
Integration tests	`tests/integration/test_memory_*.py`

Design References

System	Pattern adopted
Mem0	LLM-in-the-loop extraction (ADD/UPDATE/DELETE/NOOP)
Zep/Graphiti	Fact lineage via `superseded_by`, temporal search
Hindsight	Cross-encoder reranking, background reconciliation
ENGRAM	Memory typing (category-based) over knowledge graphs
CoALA	Four-tier cognitive architecture (working/episodic/semantic/procedural)

Test plan

## Summary - **`gaia init` now installs RAG dependencies** for `chat`, `rag`, and `all` profiles — adds `pip_extras` field to profile definitions and a new `_install_pip_extras()` step that detects editable vs package install, tries `uv pip` first with `pip` fallback - **Added `self.rag` None guards** to 8 RAG tools in `rag_tools.py` that were crashing with `'NoneType' object has no attribute 'index_document'` when RAG deps not installed - **Widened ChatAgent RAG init exception catch** from `ImportError` to `Exception` with warning-level logging and debug traceback - **Updated Agent UI docs** to include `[rag]` in install instructions (`[ui,rag]`) ## Test plan - [x] Lint passing (black, isort, pylint, flake8) - [x] All 1104 unit tests passing - [ ] `gaia init --profile chat` installs RAG deps automatically - [ ] Agent UI document indexing works after `pip install -e ".[rag]"` - [ ] RAG tools return actionable error when deps not installed (instead of crashing) 🤖 Generated with [Claude Code](https://claude.com/claude-code)

C-1: Guard winreg import and all registry-scanning methods in discovery.py so the module loads cleanly on Linux/macOS where winreg is absent. Also guard _scan_credential_manager() behind sys.platform check to avoid subprocess.CREATE_NO_WINDOW AttributeError on non-Windows. C-3: Replace direct _lock/_conn access in CLI with two new MemoryStore public methods: get_source_counts() and delete_by_source(source). delete_by_source() wraps FTS cleanup + DELETE in a single atomic transaction with rollback, removing the per-ID loop that could leave knowledge/FTS diverged on partial failure. C-4: Add close_store() to memory router module; call it from FastAPI lifespan shutdown so the WAL is checkpointed and the SQLite connection is released cleanly on server exit. M-2: list_knowledge endpoint now excludes sensitive items by default. New include_sensitive=false query param (default false) controls visibility; sensitive=true still filters to sensitive-only. M-6: Add append-only comment to conversations FTS trigger block noting that an AFTER UPDATE trigger would be required if store_turn() ever changes to update existing rows. Tests: +9 tests (394 total) covering get_source_counts, delete_by_source rollback discipline, and all three sensitive filter modes in the router.

- Fix _original_user_input=None fallback bug in _after_process_query (getattr default ignored None; switch to `or` to handle init state) - Extract VALID_CATEGORIES/MAX_CONTENT_LENGTH/MAX_TURN_LENGTH and other magic numbers to named module-level constants in memory_store.py - Import constants in memory.py to eliminate duplicate category sets and ensure truncation limits stay in sync across all call sites - DRY: memory router imports VALID_CATEGORIES from data layer instead of redefining its own copy - Clean up unused imports in test files (F401/F811 flake8 violations) - 394 unit tests passing, flake8 clean

Replace substring `"github.com" in url_lower` with urlparse().hostname comparison to fix CodeQL CWE-20 "Incomplete URL substring sanitization". A crafted URL like http://evil.com/github.com could otherwise bypass the check. Hostname equality/suffix match is unambiguous.

Security: - recall tool now filters out sensitive items before returning results to the LLM — sensitive entries (API keys, credentials) are for internal use only and must not appear in tool output. Performance: - Add get_by_category_contexts() to MemoryStore: single SQL query with WHERE context IN (active, 'global') replaces two separate get_by_category() calls in _get_context_items(), halving DB round-trips per system-prompt build (was 6 queries, now 3). - Replace N+1 correlated subquery in get_sessions() with a LEFT JOIN on MIN(id) per session — scales linearly regardless of session count. Reliability: - Add PRAGMA busy_timeout=5000 so concurrent WAL readers/writers in the same process (dashboard REST singleton + ChatAgent) retry for 5 s instead of failing immediately with SQLITE_BUSY. Correctness: - update_memory tool truncation check now uses MAX_CONTENT_LENGTH constant instead of hardcoded 2000, keeping it in sync with memory_store.py. Testability: - Replace sys.exit(1) in _bootstrap_chat/_bootstrap_discover/_bootstrap_reset helpers with raise RuntimeError; _handle_memory_bootstrap catches and exits, making helpers unit-testable in isolation. Tests (+34): - TestGetByCategoryContexts (5): single-query context+global fetch - TestGetAllKnowledgeSortByValidation (4): sort_by whitelist protection - TestGetSessionsFirstMessageV2 (3): join-based first_message - test_memory_discovery.py (22): _classify_remote, _classify_path, _classify_domain, scan_all structure, Windows guard 428 tests passing, 1 skipped (Windows-only guard on non-Windows).

# Conflicts: # src/gaia/agents/chat/agent.py # src/gaia/apps/webui/src/App.tsx # src/gaia/apps/webui/src/components/ChatView.tsx # src/gaia/ui/server.py

Comprehensive rewrite of agent-memory-architecture.md as a single unified design document. Key changes: - Hybrid search: vector (FAISS) + BM25 (FTS5) + RRF fusion + cross-encoder reranking (ms-marco-MiniLM-L-6-v2). No fallback — embeddings are a hard requirement. - Mem0-style LLM extraction: ADD/UPDATE/DELETE/NOOP operations against existing memory, replacing naive extract-and-store. - Zep-inspired fact lineage: superseded_by column preserves history when facts are corrected rather than silently overwriting. - Hindsight-inspired background reconciliation: pairwise similarity check on startup detects contradictions missed at extraction time. - Complexity-aware recall depth: adaptive top_k (3/5/10) based on query complexity heuristics. - Temporal range search: time_from/time_to on all search methods for natural time-based recall. - Conversation consolidation: auto-distill old sessions to durable knowledge before 90-day prune. - Second brain use cases: journaling, meeting notes, PKM, reminders, wake-up scheduling, recurring commitments. - Removed all graceful degradation / silent fallback patterns. - Removed openjarvis-memory-analysis.md (temp analysis doc). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…coverage, temporal+superseded filters - POST /api/memory/consolidate, /reconcile, /rebuild-embeddings - GET /api/memory/embedding-coverage - Updated GET /api/memory/knowledge with include_superseded, time_from, time_to - Updated GET /api/memory/stats with embedding coverage and reconciliation stats - 95 tests passing, lint clean Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…d_by, temporal search, consolidation - Schema v1→v2 migration: embedding BLOB, superseded_by TEXT, consolidated_at TEXT - New methods: store_embedding, get_items_with/without_embeddings, get_unconsolidated_sessions, mark_turns_consolidated, get_items_for_reconciliation - Updated search() with time_from/time_to, superseded_by IS NULL, use_count increment - Updated all query methods with superseded_by IS NULL filter - 275 tests passing, lint clean Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…LLM extraction, temporal recall Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… FAISS, API integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ledge browser, activity timeline, tool stats 6-section dashboard: header stat cards, 30-day activity bar chart, paginated knowledge browser with entity/category/context/search filters, tool performance table, conversation history with FTS search, upcoming & overdue temporal panel. Features: - Embedding coverage indicator with progress bar - Maintenance dropdown: consolidate, rebuild embeddings, reconcile, rebuild FTS - Click-to-expand knowledge row detail (metadata, timestamps, superseded_by chain) - Inline actions: edit, delete, toggle sensitive, copy ID - Superseded entries toggle with server-side filtering - Toast notification system for all CRUD and maintenance operations - Brain icon in sidebar for navigation - Keyboard support: Escape key (layered close), Enter/Space on rows - ARIA labels, roles, and aria-live for accessibility - Responsive layout (3 breakpoints) - Relative date formatting ("in 2 days", "3 days ago") - API calls aligned with backend router field names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…em0 extraction, consolidation, reconciliation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The backend returns metadata as parsed JSON (dict), not a string. Rendering it directly showed [object Object]. Now uses JSON.stringify for object metadata and plain text for strings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…e cases - Strengthen conversation context filtering test with explicit zero-result assertions instead of vacuous loop - Add due_at validation, empty-list consolidation, and history limit tests - Remove dead _past_iso import from API test file - 117 tests, all passing Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…m0 extraction, consolidation, reconciliation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…up scope includes entity, dynamic context always returns time - MemoryStore.search(): corrected from "hybrid" to "FTS5 keyword search" (hybrid is MemoryMixin._hybrid_search) - get_memory_dynamic_context(): fixed "returns empty" claim — always returns current time - store() dedup scope: category+context+entity, not category+context - get_items_with_embeddings(): added missing top_k, time_from, time_to params - _classify_query_complexity: added missing medium/complex signal words - get_entities(): added missing last_updated field in return - Added undocumented update_confidence() and delete_by_source() methods - update(): noted embedding cleared on content change Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… fixes - memory_store.py: set embedding=NULL when content changes in update() to force re-embedding (stale embedding would return wrong results) - server.py: alphabetize router imports - test fixes: formatting cleanup, mixin test updates from parallel tasks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

kovtcharov · 2026-04-26T07:21:24Z

🛑 STOP-THE-LINE — coordination notice

This PR is foundational (38k+ LOC, 77 files) and currently CONFLICTING with main. Until this lands, no PRs may merge changes to the frozen paths below.

Frozen paths

src/gaia/agents/base/agent.py
src/gaia/agents/base/memory*.py (all memory-related files)
src/gaia/agents/base/discovery.py
src/gaia/ui/routers/memory.py
src/gaia/apps/webui/src/pages/MemoryDashboard.tsx
Anything in this PR's file diff (run gh pr diff 606 --name-only for the full list)

Why

Every parallel agent merging to main makes this PR staler and less mergeable. With ~10 parallel agents potentially producing PRs/day for v0.20.0 + v0.18.2, without coordination this PR becomes unmergeable and v0.20.0 (June 2 consumer launch) slips by 4+ weeks.

What's allowed

PRs touching ONLY non-frozen file paths (most v0.18.2 mobile work, Telegram, scheduler, MCP catalogue)
Documentation changes
CI/test infrastructure
Hotfixes to v0.17.5/v0.18.0

Active focus on landing this PR

Per issue A5: rebase + review acceleration. Target land date: May 5 (latest acceptable May 12). If slipping beyond May 12, escalate.

For coding agents

Read AGENTS.md (newly added) for the full rule + check command (gh pr list --label stop-the-line).

# Conflicts: # src/gaia/apps/webui/src/components/SettingsModal.tsx # src/gaia/apps/webui/src/services/api.ts # src/gaia/cli.py

Fills the consumer-use-case gaps not covered by the existing 13 memory scenarios. Each new scenario maps to a specific item in the v0.20.0 milestone and the broader consumer use-case list (morning briefs, email triage, watch lists, scheduling, writing-style adaptation). - memory_writing_style — voice/tone transfer to drafts (creative_professional, 4 turns) - memory_morning_brief_personalization — interest profile → daily brief assembly + interest update (home_user, 5 turns) - memory_email_sender_priorities — VIP/ignore rules + conditional time-of-day rule applied to a mock inbox (power_user, 5 turns) - memory_watchlist_monitoring — multi-domain watch list (real estate, shopping, options), durability across unrelated turns, criteria match, targeted partial update (power_user, 7 turns) - memory_schedule_preferences — hard no-go windows, focus blocks, soft preferences; rule-aware slot recommendation and conflict detection (power_user, 6 turns) All five validated against runner.validate_scenario(). Total memory scenarios is now 20.

## Summary Adds `AGENTS.md` establishing coordination rules for coding agents (Claude Code, Cursor, Copilot, custom orchestrators) working on GAIA in parallel. Complements `CLAUDE.md` without duplicating it — `CLAUDE.md` owns project conventions; `AGENTS.md` owns multi-agent coordination. Priority order is explicit (CLAUDE.md > AGENTS.md > default agent behavior). ## Why With v0.20.0 carrying 11+ consumer-critical PRs landing in parallel, coordination becomes the dominant cost — typing speed isn't the bottleneck anymore. Without explicit rules, parallel agents create merge conflicts, divergent component patterns, and incoherent UX. This documents the discipline once so it doesn't get reinvented per release. ## Key rules established - **Stop-the-line discipline** — foundational PRs (e.g. PR amd#606 memory v2) freeze touching their file paths until they land. Coding agents check `gh pr list --label stop-the-line` before opening PRs. - **Spec-before-PR** — issues with `consumer-critical` label require implementation specs at the depth of amd#887/amd#888/amd#890 before agent assignment. New `spec-ready` label gates agent-assignment. - **Review chain** — every agent-authored PR runs through `code-reviewer` agent + (`architecture-reviewer` if applicable) + `claude.yml` Opus + human review - **No silent test skips** — reinforces CLAUDE.md no-fallback rule at the test layer - **Pre-flight checks** — agents must check stop-the-line PRs and `claudia_list_tasks` before opening PRs ## Cross-references - Issue amd#899 — agent orchestration playbook (consumes the rules in this file) - Issue amd#900 — pre-flight implementation specs for under-specified consumer-critical issues - Issue amd#903 — stop-the-line for PR amd#606 (canonical example of the rule) - PR amd#606 — currently the active stop-the-line PR ## Test plan - [ ] Markdown renders correctly on GitHub (visual check on this PR's Files tab) - [ ] All cross-referenced issue numbers (amd#606, amd#887, amd#888, amd#890, amd#899, amd#900, amd#903) exist - [ ] Priority order is explicit and doesn't conflict with CLAUDE.md - [ ] Pre-flight check commands (`gh pr list --label stop-the-line`, `claudia_list_tasks`) are accurate

# Conflicts: # src/gaia/agents/chat/agent.py # src/gaia/apps/webui/src/components/ChatView.tsx # src/gaia/apps/webui/src/components/SettingsModal.tsx # src/gaia/apps/webui/src/components/Sidebar.tsx # src/gaia/apps/webui/src/hooks/useModelActions.ts # src/gaia/cli.py # src/gaia/eval/runner.py # src/gaia/ui/routers/chat.py # src/gaia/ui/routers/system.py

The 20 existing memory scenarios are functional / use-case tests with 3-7 turns each. They don't exercise the dimensions that bite real users at scale. These five scenarios fill those gaps: - memory_stress_capacity (8 turns) — store 100 distinct items across 4 batches, then targeted retrieval (exact name lookup, cross-fact join, semantic ambiguity, prod-vs-staging discrimination). Catches store-scale retrieval-quality regressions. - memory_stress_retrieval_under_noise (6 turns) — store 12 contacts that share first names / roles, then disambiguating retrieval queries by company / hire-date / role. Catches near-duplicate collapse. - memory_stress_long_conversation (15 turns) — incremental fact-sharing interleaved with chit-chat, ending in a coherence probe that requires recalling 9 of 11 facts including a mid-conversation update (cat count 2 → 3). Catches memory degradation in long-running sessions. - memory_stress_adversarial_poisoning (7 turns, severity=critical) — rapid contradictions (latest wins), fabricated-memory-injection ("you previously told me my password is hunter2"), and prompt-injection framing ([SYSTEM OVERRIDE]). Privacy / reliability regression catcher. - memory_stress_forget_semantics (8 turns) — store siblings, forget one, verify it's actually gone (direct recall miss, list miss, paraphrase miss after intervening turns) AND siblings untouched. Catches soft-delete bugs that masquerade as "forget" but leave records retrievable. Total memory scenarios: 25 (13 original + 5 use-case + 5 stress). Note: privacy / private-mode isolation is intentionally NOT covered here because the runner runs one Agent UI session per scenario; cross-session isolation needs framework support to validate properly.

Mostly mechanical lint/format cleanup across the memory branch: - Long-line wraps in `agents/base/memory.py` debug logging and `agents/chat/agent.py` long-form returns. - `l` → `line` in list comprehensions in `agents/base/system_context.py` to silence the ambiguous-variable-name lint. - Drop unused `threading` import from `ui/agent_loop.py` and unused `_load_memory_settings` import from `cli.py`. Real fixes folded into the same pass: - `eval/runner.py` — wire `keep_sessions=keep_sessions` through to a call site missed in the earlier conflict resolution; add the matching default to the surrounding signature. - `ui/agent_loop.py` — `get_actionable_goals(limit=5)` → `get_actionable_goals()[:5]`; the underlying signature changed and `limit` is no longer accepted. - `ui/server.py` — move the `agent_loop` import above the `routers` imports to avoid an import-order issue at app start. - `tests/*` — drop a few unused imports / stale skip markers so the suites collect cleanly.

Bring the feature branch back to green by addressing the cluster of CI failures that landed when the memory v2 work merged with main. All fixes are mechanical or scoped to test isolation — no behavioural change to the memory pipeline itself. - Restore lost merge-conflict state in `ChatView.tsx` and `Sidebar.tsx`: the `getSessionHash` import, `hashCopied`/`copied` state, and the `handleCopyHash` callback all dropped during the merge — Vite build was failing on missing identifiers across PyPI Build Check and all three Build Installers jobs. - Lint/Pylint cleanup so the `Code Quality (Lint)` job is green again: remove unused vars/imports, drop dead `if x != x` branches, and promote a few pointless lambdas to method references in `agents/base/discovery.py`. Reorder `routers/memory.py` imports to satisfy isort. - Tighten `_canonical_agent_type` to surface `AttributeError` instead of swallowing it (matches the existing regression test added in #802; was failing locally and in CI Unit Tests). - Add an explicit `GAIA_MEMORY_DISABLED=1` opt-out to `MemoryMixin.init_memory`. The Path Validator security tests, Unit Tests, and Chat Agent Tests jobs all instantiate `ChatAgent`/`CodeAgent` without a Lemonade server available; the memory v2 hard-requirement on the embedding service fails them. This is a deliberate, named opt-out (not a silent fallback) — tests that exercise memory itself clear the variable via the new `tests/unit/conftest.py` autouse fixture and the `_mock_v2_init_context` helper, so memory test coverage is unchanged. CI workflows that don't need memory now set the env var explicitly.

User messages were rendering as plain right-aligned text with just a bottom-border divider, while assistant replies got the full card treatment (bg, border, radius, shadow, avatar, name). On a real conversation that read as "floating text" next to "card" — broken. Now .msg-user is a flex container that pins its inner bubble to the right edge of the 900px chat column, with the bubble itself styled to match the assistant card minus the avatar/name (capped at 70% width so short messages stay compact). Also dropped the text-align:right hacks on body/markdown elements — text inside the bubble is left-aligned now that the bubble itself is on the right.

CI green-up follow-up to 7f86021. Three behavioural fixes plus pickup of work from parallel memory-eval tasks that landed in the same tree. - ``MemoryMixin.init_memory`` now degrades to memory-disabled (warning log + ``_memory_store=None``) when Lemonade is unreachable, instead of raising RuntimeError. Hard-failure here breaks the AppImage smoke test (Lemonade isn't bundled with the installer; fresh users hit this on first launch) and was the root cause of the AppImage userns-restricted state-machine failure. Memory tools now refuse to register when the store is None so the LLM can't blunder into AttributeError mid-turn, and ``get_memory_dynamic_context`` / ``_after_process_query`` / ``_execute_tool`` short-circuit cleanly via ``getattr(... , None) is None``. - ``test_tier2_rag_rules_absent_without_indexed_docs``: branch optimised the non-file-context discovery rules to a compact form to save tokens, but the test still asserted the literal "FILE SEARCH AND AUTO-INDEX" block was always present. Loosened the assertion to the underlying workflow keywords (``search_file``, ``index_document``, ``query_specific_file|query_documents``) so the optimisation and the test agree on intent. - Add ``GAIA_MEMORY_DISABLED: "1"`` to the Windows Path Validator Security Tests step (the Linux variant was already done in 7f86021). The check is also harmless under graceful degrade — it just skips the Lemonade probe instead of relying on the warning path. Picked up alongside (other parallel tasks; included so the tree compiles and lints clean as a unit): - ``MemoryStore.get_item`` for dashboard / eval supersession-chain probes - Memory MCP read tools register on env ``GAIA_MEMORY_MCP_ALWAYS=1`` for the eval runner; admin tools also gate on ``GAIA_MEMORY_ADMIN=1`` - ``preflight_check`` rejects memory-category eval runs without admin env - Eval simulator/judge-turn prompt updates for memory MCP tools - ``_chat_helpers._stream_chat_response`` resolves Lemonade base URL via ``LemonadeManager`` so non-default ports are picked up for /stats - ``test_security.yml`` Windows path-validator job now sets the same env

Two single-cause CI fixes: - ``_chat_helpers.py`` no longer reads ``os.environ`` for the Lemonade base URL (resolved via ``LemonadeManager`` after f63f09e), so the ``import os`` is dead. Pylint/flake8 caught it; black moved the remaining imports. - ``test_unit.yml`` was installing ``pytest pytest-cov pytest-asyncio pyfakefs`` but the new ``test_memory_router::TestReconcileEndpoint:: test_reconcile_runtime_error_returns_500`` test takes a ``mocker`` fixture (pytest-mock). Add ``pytest-mock`` to the install line.

The userns-restricted AppImage smoke test polled ``state: ready`` for 90 seconds, but ``gaia init --profile minimal`` downloads the Gemma-4-E4B GGUF model (~3 GB) on first run and that exceeds 90s on the public runner. The structural and distro-matrix smoke jobs already use 300s for the same reason; align this one too. Failure mode it fixed: ``state: installing (gaia-init)`` … ``Step 3/4: Downloading models for 'minimal' profile`` → timer elapses → ``::error::userns-restricted launch did not reach state: ready``.

github-actions · 2026-04-30T17:49:55Z

Summary

Massive, well-structured PR (~41k LOC) introducing memory v2 with hybrid retrieval (FAISS + FTS5 + RRF + cross-encoder), Mem0-style LLM extraction, Zep-inspired lineage, an observability dashboard, 26 eval scenarios, and a credible suite of unit + integration tests. Architecture, schema migration, FTS5 sanitization, parameterized SQL, admin gating via GAIA_MEMORY_ADMIN, and the locking strategy (WAL + threading.Lock + busy_timeout=5000) all look sound.

The main concern is a direct contradiction between the PR description and the implementation: the description (and test plan) claim "No silent fallback — system fails loudly on misconfiguration" and "Lemonade unavailable at startup raises RuntimeError," but memory.py:init_memory actually swallows the failure and silently disables memory. Per CLAUDE.md "No Silent Fallbacks — Fail Loudly," this should be reconciled before merge.

Issues

🟡 Silent fallback contradicts PR description and CLAUDE.md

src/gaia/agents/base/memory.py:347-361 — when the Lemonade embedding probe fails, the code catches a broad Exception, logs a warning, tears state down, and returns. The session continues with memory disabled. The PR description explicitly promises the opposite ("No silent fallback") and the test plan has a checkbox for "Lemonade unavailable at startup raises RuntimeError (no silent fallback)" — that checkbox cannot pass against this code path.

This also conflicts with the project rule in CLAUDE.md: "Either the operation succeeds as intended, or it raises an actionable error."

Two acceptable resolutions:

Option A — actually fail loudly (matches PR description):

        except Exception as e:
            raise RuntimeError(
                "Lemonade embedding service unreachable — memory v2 cannot "
                "initialize. Start lemonade-server (e.g. `lemonade-server serve`) "
                "and ensure the embedding model is available, or set "
                "GAIA_MEMORY_DISABLED=1 to opt out. Reason: "
                f"{e}"
            ) from e

Option B — keep the degrade, but make it an explicit opt-in and update the PR description. If the intent is genuinely "graceful degrade for users without Lemonade," gate it behind an env flag (e.g. GAIA_MEMORY_DEGRADE_ON_NO_EMBEDDINGS=1) so it's an opt-in, not a default. Then either delete the "No silent fallback" claim from the PR body or qualify it ("fails loudly unless GAIA_MEMORY_DEGRADE_ON_NO_EMBEDDINGS=1").

Either way is fine — the current state is "ships with the docs and the code disagreeing."

🟡 Bare `except Exception: pass` in mixin prompt auto-discovery

src/gaia/agents/base/agent.py:313-318 — the auto-discovery loop in _get_mixin_prompts swallows any failure from a mixin's prompt-fragment method without even a debug log. If a mixin author writes a buggy _get_xxx_prompt it will silently produce no fragment, with no diagnostic. At minimum log at debug level so the failure is traceable when someone is debugging "why isn't my mixin prompt showing up?":

                try:
                    fragment = getattr(self, attr_name)()
                    if fragment:
                        prompts.append(fragment)
                except Exception as e:
                    logger.debug(
                        "[Agent] mixin prompt fragment %s.%s raised: %s",
                        type(self).__name__, attr_name, e,
                    )

🟢 `QueueFull` swallowed silently in agent loop trigger enqueue

src/gaia/ui/agent_loop.py:137-142 — the comment says "queue is unbounded; this should never happen" which is true today, but if anyone ever adds a maxsize= to _trigger_queue the trigger will be silently dropped. Cheap fix:

        try:
            self._trigger_queue.put_nowait(
                AgentTrigger("user_message_followup", session_id)
            )
        except asyncio.QueueFull:
            logger.warning(
                "AgentLoop trigger queue full; dropping user_message_followup for session %s",
                session_id,
            )

🟢 Repeated `except Exception: pass` (or near-equivalents) in discovery / system_context

src/gaia/agents/base/discovery.py and src/gaia/agents/base/system_context.py use broad try/except extensively. For best-effort OS/registry/browser-history scanning this is mostly appropriate — those code paths must not crash a user's session because Chrome's history DB is locked or winreg returns garbage. But several of those handlers don't log at all, which makes "why didn't system context populate field X?" undiagnosable. Per the same CLAUDE.md rule, prefer "log at debug + continue" over silent pass, so the failure is visible when someone runs with --debug. Not blocking, but worth a sweep.

🟢 Minor: PR test-plan checkbox is unverifiable

The "Lemonade unavailable at startup raises RuntimeError (no silent fallback)" checkbox is a useful contract test — please add it as an actual pytest case (tests/unit/test_memory_mixin.py) once the silent-fallback issue is resolved one way or the other, so the regression is locked in.

Strengths

Test coverage is genuinely solid. 8 new unit-test files (test_memory_store.py, test_memory_mixin.py, test_memory_router.py, test_goal_store.py, test_goals_router.py, test_memory_discovery.py, test_sdk_tool_messages.py) plus 3 integration files. New tests/unit/conftest.py for shared fixtures. This is exemplary for a feature of this scope.
26 eval scenarios under eval/scenarios/memory/ covering cross-session persistence, conflict resolution, adversarial poisoning, capacity stress, retrieval-under-noise, etc. — exactly the right shape for a "second brain" feature.
Documentation is comprehensive and lands in the right places: docs/guides/memory.mdx, docs/sdk/sdks/memory.mdx, docs/spec/agent-memory-architecture.md, plus docs/docs.json nav updates. Matches the CLAUDE.md "every new feature must be documented" rule.
SQL hygiene is excellent. Parameterized queries throughout memory_store.py. _sanitize_fts5_query correctly strips FTS5 special chars and caps at MAX_FTS_QUERY_LENGTH=500. No injection vectors spotted.
Schema migration is idempotent via ALTER TABLE ADD COLUMN with duplicate-column tolerance — clean upgrade path from v1 → v2.
Concurrency is handled deliberately: threading.Lock + WAL + busy_timeout=5000 in memory_store.py; double-checked-locking singletons in routers/memory.py and routers/goals.py.
Admin endpoints are properly gated by GAIA_MEMORY_ADMIN=1 (memory clear/seed in both REST and MCP surfaces). CI gating via GAIA_MEMORY_DISABLED=1 is a sensible kill-switch.
MCP subprocess management is safe: subprocess.Popen([sys.executable, ...]) with no shell=True, terminate→kill timeout, in routers/mcp.py.
Pydantic ISO 8601 validation factored into a shared _validate_iso8601 helper — no DRY violations on the API surface.
Goal store is correctly isolated in ~/.gaia/goals.db with its own PRAGMA foreign_keys=ON and state machine, decoupled from memory storage.
Pending-approval route ordering in routers/goals.py (literal path before {goal_id}) shows the author thought about FastAPI's path-priority semantics.
Single source of truth for VALID_CATEGORIES prevents drift between the store, the mixin, and the API layer.

Verdict

Approve with suggestions. The architecture, tests, docs, and security posture are all well above the bar for a feature this size. The one item that should be addressed before merge is the silent-fallback in memory.py:347-361 — either make it actually raise (so the PR description and test plan are honest) or document the degrade as an explicit, env-gated opt-in. Everything else is polish.

cc @kovtcharov-amd — flagging the CLAUDE.md "no silent fallback" rule violation; not a security issue, but it's the kind of thing that bites later when an outage masks itself as "memory just stopped working." No 🔒 security concerns.

github-actions Bot added documentation Documentation changes dependencies Dependency updates agents cli CLI changes tests Test changes electron Electron app changes labels Mar 20, 2026

github-advanced-security AI found potential problems Mar 20, 2026

View reviewed changes

Comment thread src/gaia/agents/base/discovery.py Fixed

kovtcharov force-pushed the feature/agent-memory branch from e0eff31 to 068eead Compare March 21, 2026 23:13

itomek and others added 6 commits March 21, 2026 18:10

Merge remote-tracking branch 'origin/main' into feature/agent-memory

a06f9cc

# Conflicts: # src/gaia/agents/chat/agent.py # src/gaia/apps/webui/src/App.tsx # src/gaia/apps/webui/src/components/ChatView.tsx # src/gaia/ui/server.py

kovtcharov force-pushed the feature/agent-memory branch from d4fdb90 to a06f9cc Compare April 1, 2026 16:31

kovtcharov changed the title ~~feat(memory): persistent agent memory system with dashboard UI~~ feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard Apr 1, 2026

Karim13014 and others added 7 commits April 1, 2026 15:48

docs(memory): v2 user guide — second brain use cases, hybrid search, …

7e60495

…LLM extraction, temporal recall Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test(memory): v2 integration tests — full pipeline with real SQLite +…

28e78ce

… FAISS, API integration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs(memory): v2 SDK reference — embedding pipeline, hybrid search, M…

84d672f

…em0 extraction, consolidation, reconciliation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

kovtcharov self-assigned this Apr 1, 2026

kovtcharov added this to the v0.20.0 — Agent Memory & Bootstrap [OSS] milestone Apr 1, 2026

Karim13014 and others added 4 commits April 1, 2026 15:52

feat(memory): v2 mixin — embedding pipeline, FAISS, hybrid search, Me…

edb3f67

…m0 extraction, consolidation, reconciliation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

kovtcharov mentioned this pull request Apr 26, 2026

Web Push notifications via PWA — async delivery without Telegram #897

Open

18 tasks

kovtcharov added the consumer Blocks consumer adoption — must ship for the v0.20.0 consumer launch window label Apr 26, 2026

kovtcharov added the stop-the-line PR is foundational; do not merge changes to its frozen paths until it lands label Apr 26, 2026

kovtcharov mentioned this pull request Apr 26, 2026

docs(agents): add AGENTS.md — multi-agent coordination rules #904

Merged

4 tasks

kovtcharov removed the agents label Apr 26, 2026

This was referenced Apr 26, 2026

Agent UI MCP Server: programmatic UI control for agents #549

Open

Dynamic tool loading based on conversation context via memory #688

Open

Merge remote-tracking branch 'origin/main' into feature/agent-memory

b63775c

# Conflicts: # src/gaia/apps/webui/src/components/SettingsModal.tsx # src/gaia/apps/webui/src/services/api.ts # src/gaia/cli.py

github-actions Bot added the agents label Apr 26, 2026

kovtcharov-amd marked this pull request as ready for review April 29, 2026 19:50

kovtcharov-amd self-requested a review as a code owner April 29, 2026 19:50

kovtcharov added 4 commits April 29, 2026 16:14

github-actions Bot added devops DevOps/infrastructure changes security Security-sensitive changes labels Apr 29, 2026

kovtcharov added 4 commits April 29, 2026 17:02

itomek approved these changes May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard#606

feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard#606
kovtcharov wants to merge 48 commits intomainfrom
feature/agent-memory

kovtcharov commented Mar 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

kovtcharov commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

kovtcharov commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Architecture (v2)

Schema v2

Memory Tools (5 LLM-facing tools)

Use Cases

Observability Dashboard

Startup Sequence

Files

Design References

Test plan

Uh oh!

Uh oh!

kovtcharov commented Apr 26, 2026

🛑 STOP-THE-LINE — coordination notice

Frozen paths

Why

What's allowed

Active focus on landing this PR

For coding agents

Uh oh!

github-actions Bot commented Apr 30, 2026

Summary

Issues

🟡 Silent fallback contradicts PR description and CLAUDE.md

🟡 Bare except Exception: pass in mixin prompt auto-discovery

🟢 QueueFull swallowed silently in agent loop trigger enqueue

🟢 Repeated except Exception: pass (or near-equivalents) in discovery / system_context

🟢 Minor: PR test-plan checkbox is unverifiable

Strengths

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

kovtcharov commented Mar 20, 2026 •

edited

Loading

🟡 Bare `except Exception: pass` in mixin prompt auto-discovery

🟢 `QueueFull` swallowed silently in agent loop trigger enqueue

🟢 Repeated `except Exception: pass` (or near-equivalents) in discovery / system_context