Skip to content

feat(ui): policy alert cards, notifications, and durable receipts#952

Merged
itomek merged 31 commits into
amd:mainfrom
dislovelhl:feat/optional-governance-layer
May 8, 2026
Merged

feat(ui): policy alert cards, notifications, and durable receipts#952
itomek merged 31 commits into
amd:mainfrom
dislovelhl:feat/optional-governance-layer

Conversation

@dislovelhl

@dislovelhl dislovelhl commented May 4, 2026

Copy link
Copy Markdown
Contributor

Closes #925

Policy BLOCK decisions were already emitted as policy_alert SSE events, but Agent UI users could not reliably see, understand, or revisit those decisions. This PR turns those backend policy events into visible Policy Shield activity cards, critical notifications/toasts, and persisted receipt details so blocked tool calls stay clear across reloads.

Threads

  • Durable persistence: policy-only BLOCK streams are saved with decision, reason, rule IDs, policy version, and receipt ID so the messages API can restore them.
  • UI surfacing: chat stream handling renders Policy Shield cards, notification center filtering, and View receipt anchoring.
  • Coverage: unit, integration, and Electron assertions cover policy alert field preservation and reload visibility.

Test plan

  • cd src/gaia/apps/webui && npm run build
  • python -m pytest tests/unit/chat/ui/test_utils_helpers.py::TestMessageToResponse::test_agent_steps_preserves_policy_alert_fields tests/integration/test_chat_ui_integration.py -k policy_alert -q
  • git diff --check
  • cd tests/electron && CI=true npm test -- --runInBand

dislovelhl and others added 24 commits April 19, 2026 23:41
Adds `gaia.governance` — an opt-in, additive governance package that
wraps tool execution with an ACGS-lite-style action kernel and seams
for constitutional-swarm workflow checkpoints / receipts / policy-
version binding.

Key properties

- Zero edits to the base `Agent` class. `GovernedAgentMixin` overrides
  `_execute_tool` via `super()`; adding it to an agent costs nothing
  when no adapter is supplied.
- Canonical tool-name resolution before governance, so unprefixed MCP
  aliases cannot bypass risk tags on their canonical names.
- Fail-closed REVIEW: only an explicit `governance_reviewer` callback
  counts. The default `AgentConsole.confirm_tool_execution` auto-
  approves, so it is intentionally not consulted.
- Envelope-bound receipt hashes cover the full evidence set
  (receipt id, workflow, decision, policy version, constitution hash,
  timestamp, evidence) with strict canonical JSON.
- Workflow-bound checkpoint resolution and atomic check-and-set in
  the in-memory bridge.

Ergonomics

- `GaiaGovernanceAdapter.default(audit_log=...)` for one-line wiring
  with in-repo stubs.
- `GovernanceConfig` dataclass consolidates six governance kwargs;
  per-kwarg style preserved for back-compat.
- `@govern(risk="blocked", reason=...)` decorator colocates policy
  with tool source; explicit dict merges with decorator tags.

What's here (PR 1)

- Action-level governance via `GovernedAgentMixin`
- Protocol interfaces: `PolicyEngine`, `CheckpointRuntime`,
  `ReceiptServiceProtocol`, `PolicyBindingProtocol`
- In-memory + JSONL receipt services
- Reference stub policy engine (`RuleBasedPolicyEngine`)
- 55 unit + integration tests; pylint clean against repo `.pylintrc`
- Governed weather-agent example with CLI reviewer
- `src/gaia/governance/README.md` with quickstart and extension table

What's deferred (PR 2+)

- ACGS-lite-backed policy engine
- Persistent checkpoint bridge via constitutional-swarm
- Policy control plane wiring
- Plan-step / multi-agent workflow transitions (the
  `workflow_mapper` helper is a forward-compatibility seam)

Review: iterated against Codex (architecture / correctness / security)
and Gemini (DX / API ergonomics / docs) advisors. All HIGH / MEDIUM
findings addressed; regression tests added for each.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new governance package was missing from setup.py packages list,
causing test_all_filesystem_packages_in_setup_py to fail. Black and
isort were not run on the new files before commit.

Constraint: Black line-length follows project default (88)
Constraint: isort profile follows project default
Tested: black --check, isort --check-only both pass locally
Not-tested: full unit test suite (requires project deps)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…cation

Fix thread safety in InMemoryCheckpointBridge.create_checkpoint by holding
_lock during _records write. Guard mapping lookup in resolve_checkpoint with
.get() to raise InvalidResolutionError instead of KeyError on unknown types.

Add Lock to InMemoryReceiptService for concurrent access. Harden _read_all
deserialization to filter to known ReceiptRecord fields, silently skipping
malformed or schema-mismatched lines instead of crashing.

Replace assert in _handle_review_checkpoint with GaiaGovernanceError raise
(asserts are stripped with -O). Eliminate type: ignore[union-attr] by passing
the already-resolved adapter as an explicit parameter.

Make handle_transition REVIEW branch explicit with elif + raise
GaiaGovernanceError on unknown decision types instead of implicit fallthrough.

Remove duplicate GovernanceCallback / GovernanceReviewer aliases: define once
in config.py with specific types (ActionRequest, GovernanceDecision) and
import in mixin.py.

Confidence: high
Scope-risk: narrow
Tested: black+isort clean, syntax verified
Not-tested: full test suite (no test runner available locally)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
test_tool_decorator.py has an autouse fixture that calls
_TOOL_REGISTRY.clear() before and after each test.  When it runs
before test_governance_dx.py in CI, _dx_decorated_blocked is no longer
in the registry so _lookup_tool_fn returns None and the two tests that
depend purely on decorated tags (test_mixin_reads_decorated_tags_from_registry
and test_explicit_dict_overrides_decorated_tags) see an ALLOW decision
instead of BLOCK.

Fix: add an autouse fixture in test_governance_dx.py that re-registers
the test tools if the registry was cleared between collection and
execution.

Constraint: _TOOL_REGISTRY is a module-level mutable global; test
isolation must be explicit when multiple suites share it.
Tested: test_tool_decorator.py + test_governance_dx.py sequentially (16 passed)
Not-tested: concurrent xdist workers (not used in this CI)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Black reformatted 5 test files that were introduced without running the
formatter first: test_governance_dx.py (also picks up the autouse-fixture
added in the previous commit), test_governance_adapter.py,
test_governance_jsonl_receipts.py, test_governance_schemas.py,
test_governance_receipts.py.

No logic changes.

Tested: 40 governance unit tests pass locally
Scope-risk: narrow

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remaining 6 files flagged by CI Black check:
- tests/integration/test_governed_*.py (5 governance integration tests)
- src/gaia/mcp/mcp_bridge.py

No logic changes.

Scope-risk: narrow

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes the merge blockers from PR review:

- Tighten exception scopes in mixin and receipt service. Replace blanket
  `except Exception: pass` with specific exception types and `logger.warning`
  for the unexpected case. Most importantly, `_resolve_canonical_tool_name`
  now logs unexpected resolver errors instead of silently falling through
  to the raw name — closing the alias-bypass risk where governance could
  check tags on the wrong key.
- Correct documentation: tag merge is additive (union, deduplicated), not
  "explicit dict wins". README, decorator docstring, and mixin comment now
  match the behavior tests have always asserted.
- Strict canonical JSON for BLOCK-receipt evidence: handle non-JSON tool
  args, complex types, and cycles without falling back to repr().
- Strict canonical JSON in JsonlReceiptService.issue_receipt: reject
  non-canonical metadata (NaN/Inf, opaque objects) at issue time instead
  of allowing tampered receipts to land in the audit log.
- Register the governance SDK in public docs: new
  `docs/sdk/sdks/governance.mdx` and an entry in `docs/docs.json`.
The API Tests job has no `timeout-minutes`, so a hung Lemonade server
or stalled model pull leaves the runner spinning indefinitely (a 4-hour
no-op happened on PR amd#921). 30 min comfortably covers the worst-case
sequential path: 60s server start + 10min model pull + 2min model load
+ 30s API server start + test run.
…p-copy tags

Final polish on top of the merge-blocker fixes. Reviewer feedback from a
parallel code/architecture audit converged on these items:

- Delete `workflow_mapper.py` and `StaticPolicyBindingService.bind_receipt`.
  Both are advertised as forward-compat seams but have zero callers in
  src/, tests/, examples/, or docs/. Re-introduce them in the PR that
  actually wires the new event surface, with the real signature in hand.
- Tighten `JsonlReceiptService.get_receipt`: cache reads and writes were
  unsynchronized while a concurrent `issue_receipt` was mutating the same
  dict under `_lock`. Move the cache check + install under the lock.
- Add a `logger.debug` breadcrumb for malformed-line skips in
  `_read_all` so an operator chasing a missing receipt has something to
  grep.
- Deep-copy inner risk-tag lists in `GovernedAgentMixin.__init__` so a
  caller cannot mutate the agent's tag table after construction by
  holding onto the original list reference.
- Add a comment in `_canonical_json_value` documenting why `bool` is
  checked before `int` (subclass relationship — without the ordering,
  `True` would canonicalize as `1`).
- README: drop the `workflow_mapper` mention from "What's not here yet"
  now that the seam is gone.
Adds tests for the previously-uncovered branches surfaced by the
test-coverage audit. Each test guards against a specific regression:

- `test_resolver_unexpected_exception_logs_and_governs_raw_name` — proves
  a buggy `_resolve_tool_name` that raises an unexpected exception still
  triggers governance on the raw name AND emits an operator-visible
  warning. Future regression where the warning is swapped for a silent
  fallback fails this test.
- `test_resolver_lookup_error_is_silent_and_governs_raw_name` — proves
  the expected "tool not in registry" case (LookupError) is absorbed
  silently with no log noise.
- `test_unknown_transition_outcome_fails_closed` — proves a custom
  `CheckpointRuntime` returning a status the mixin doesn't know is
  denied, not let through.
- `test_handle_transition_rejects_unknown_decision_type` — same idea at
  the adapter layer for an unknown `GovernanceDecision.decision`.
- `test_read_all_skips_malformed_lines` — proves a corrupt line in the
  middle of an audit log doesn't block readers from finding subsequent
  valid records.
- Existing callback-exception and reviewer-exception tests gain caplog
  assertions so a future silent-swallow regression is caught.

Plus two readability fixes:
- Rename `test_explicit_dict_overrides_decorated_tags` to
  `test_explicit_empty_dict_does_not_downgrade_decorator_tags` — the
  body asserts additive semantics; the old name said the opposite.
- Replace hardcoded `"test_governance_adapter.SlotOnlyEvidence"`
  qualname strings with `f"{Cls.__module__}.{Cls.__qualname__}"` so the
  tests survive a file rename.
… log

`_prompt_review` now returns `(approved, exception_or_None)` instead of just
`approved`. When a reviewer raises, `_handle_review_checkpoint` stamps the
exception type and message into `CheckpointResolution.reason` so the receipt
metadata records "reviewer raised RuntimeError: bad reviewer" rather than the
boilerplate "reviewer rejected" — which previously made the audit log unable
to tell a deliberate "no" from a crash.

The operator-facing `logger.warning` was already in place; this commit closes
the audit-trail gap so downstream consumers (compliance, forensics, retros)
can distinguish the two without grepping operator logs.

Adds two tests:
- `test_reviewer_exception_is_treated_as_reject` extended to assert the
  receipt's `metadata.evidence.resolution.reason` contains the exception
  type and message
- new `test_reviewer_explicit_no_keeps_plain_reason` — a reviewer that
  returns False produces a plain "reviewer rejected" reason, not an
  exception-flavored one (regression guard against false positives)
The Claude AI Assistant workflow runs the claude-code-action which
requires ANTHROPIC_API_KEY. That secret is only configured on the
canonical amd/gaia repo, so every fork without the secret hits a
hard failure on PR-review, issue-handler, and release-notes events.

Add `github.repository == 'amd/gaia'` to each job's `if:` so the
workflow no-ops on forks rather than failing red. Forks can still
opt-in by setting their own ANTHROPIC_API_KEY and removing the guard,
but the default is silent skip.

Tested by re-running PR #3 on dislovelhl/gaia-acgs after this commit:
all four jobs should report `Skipped` instead of failing.
Addresses architectural feedback on amd#921 (review 4197475871). Governance
REVIEW now reuses GAIA's existing blocking confirmation flow when the
active console advertises it, instead of running as a parallel
enforcement path that silently fails closed.

- OutputHandler grows a `blocking_confirmation: bool = False` capability
  flag; SSEOutputHandler sets it to True (it already blocks on the
  frontend permission modal).
- _prompt_review precedence: explicit governance_reviewer wins; else
  delegate to console.confirm_tool_execution iff the console advertises
  blocking_confirmation; else fail closed. The console is resolved per
  call, not captured at __init__.
- The default console still returns True immediately, so CLI without an
  explicit reviewer continues to fail closed (no auto-approve).

Test coverage:
- tests/integration/test_governed_review_flow.py — @govern(risk=review)
  + SSEOutputHandler emits permission_request, deny resolves the
  checkpoint, denied tool body never executes, non-blocking consoles
  fail closed, audit receipt distinguishes REVIEW_REJECTED from BLOCK.
- tests/unit/chat/ui/test_sse_confirmation.py — handshake coverage for
  approve/deny/timeout/cancel.

Documents the relationship to confirm_tool_execution in
docs/sdk/sdks/governance.mdx and src/gaia/governance/README.md so the
"which mechanism shows a UI prompt?" answer is no longer ambiguous.

The legacy TOOLS_REQUIRING_CONFIRMATION set is intentionally untouched
in this commit; unifying the pipeline is staged for follow-up PRs.
Today GovernedAgentMixin returns a denied dict on BLOCK and the end user
can't tell a policy refusal from a generic tool failure. This adds a
structured policy_alert event so the Agent UI can surface a "blocked by
policy" notification with the audit receipt id.

- OutputHandler grows print_policy_alert() (default no-op for headless
  consoles).
- SSEOutputHandler.print_policy_alert emits a typed event onto the SSE
  queue with tool, decision, reason, rule_ids, policy_version, and the
  audit receipt_id when present. Tool args are deliberately excluded —
  receipt_id is the safe correlator for deep-linking back to the audit.
- GovernedAgentMixin._emit_policy_alert is called immediately before the
  denied result is returned. The receipt_id surfaced in the SSE event is
  the same id stored by the receipt service, so the alert and the audit
  log link 1:1. Emission failures are logged (warning, exc_info) and
  swallowed so a UI bug can never break governance.
- Frontend StreamEvent type union grows policy_alert + the new optional
  fields. Rendering (toast, inline card, "view receipt" route) ships in
  PR-2 once the receipt-viewer UX is decided.

Tests:
- tests/unit/chat/ui/test_sse_handler.py — exact event shape, including
  the omits-receipt-id-when-None case.
- tests/integration/test_governed_review_flow.py — full BLOCK path
  through SSEOutputHandler asserts denied result, agent.calls == [] (tool
  body never executes), audit receipt persisted, and the alert event's
  receipt_id matches the denied result's receipt_id.

Docs:
- docs/sdk/sdks/governance.mdx + src/gaia/governance/README.md document
  the event shape and intended UI consumption.
- docs/sdk/sdks/agent-ui.mdx links the event into the SSE event
  reference.

Stacked on feat/governance-review-bridge (PR #3) which lands the Path A
capability bridge for governance REVIEW.
…ckend

feat(governance): emit policy_alert SSE event when BLOCK denies a tool
feat(governance): bridge REVIEW to existing console confirmation surface
Inherited via main merge from amd#919, which introduced both a regression
test asserting AttributeError must surface and a broad-except wrapper
that swallowed it. Same commit, opposite intent — removing the except
satisfies the test, the docstring ("Ratchets the Apr-20 review fix"),
and the project's no-silent-fallbacks rule.

Also drops one stray blank line in the test file to satisfy black.

Unblocks PR 921 CI (Code Quality + Unit Tests checks).
Flake8 E731 — "do not assign a lambda expression, use a def" — flagged
the inline reviewer fallback in mixin.py:348. Behaviour is identical;
just satisfies the project's lint contract.
Persist policy BLOCK steps through reload and disconnect so policy receipts remain visible in the Agent UI. Render alerts as non-actionable notifications, toast links, and inline Policy Shield activity cards.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot added documentation Documentation changes dependencies Dependency updates devops DevOps/infrastructure changes tests Test changes electron Electron app changes agents labels May 4, 2026
@itomek itomek enabled auto-merge May 7, 2026 13:22
The email-agent merge changed OAuthPkceHandler.get_credential() to match get_or_refresh(), which returns a token string. The unit test still mocked the old tuple shape and asserted expires_at, causing the Run Unit Tests job to fail on PR amd#952.

Constraint: GitHub Actions failure was tests/unit/connectors/test_oauth_pkce.py::TestGetCredential::test_returns_token_dict_shape.

Rejected: Restore tuple unpacking in OAuthPkceHandler | get_or_refresh() is typed and implemented as returning a string, and get_access_token() callers depend on that contract.

Confidence: high

Scope-risk: narrow

Tested: PYTHONPATH=src:. python -m pytest tests/unit/connectors/test_oauth_pkce.py::TestGetCredential::test_returns_token_dict_shape -q

Tested: PYTHONPATH=src:. python -m pytest tests/unit/connectors/test_oauth_pkce.py tests/unit/connectors/test_tokens.py tests/unit/connectors/test_api.py -q

Tested: PYTHONPATH=src:. python -m pytest tests/unit/connectors/test_oauth_pkce.py -q

Tested: git diff --check
auto-merge was automatically disabled May 7, 2026 13:27

Head branch was pushed to by a user without write access

itomek
itomek previously approved these changes May 7, 2026

@itomek itomek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified at HEAD (9c3005a):

  • The prior approval was auto-dismissed by the two merge commits today. Both are responsive resolutions to main moving (#965 email agent, #967 mac uv, #969 settings cards) and don't change behaviour outside the conflict resolution.
  • The merges also pick up 2 of the 3 optional follow-ups from the prior approval — the step["active"] = False redundancy in _persist_policy_block_if_needed() is gone, and the single-writer SQLite caveat is now noted around the delete-then-add path.
  • A new regression test was added for the "follow-up I'd write next" scenario from the prior review — policy_alert emitted, stream errors mid-response, persisted block survives. Exactly the right test to add here.
  • CI: build matrix green (windows/linux/macos installers, structural smoke, electron framework, all build-apps); Linux full-integration + AppImage matrices still in flight, no failures so far. pr-review skipped — known state per #967's commit message about the bot's API credits.

Re-approving. The remaining RTL/Playwright follow-up is correctly tracked in the author's last comment as out-of-scope for this PR.


Generated by Claude Code

@dislovelhl

Copy link
Copy Markdown
Contributor Author

Fixed the failed Run Unit Tests check from job 74823361099.

Root cause: the newer email-agent base changed OAuthPkceHandler.get_credential() to match get_or_refresh(), which returns a token string. The unit test still mocked the old tuple shape and asserted expires_at, so it compared ("tok-abc", 9999999999) to "tok-abc".

Fix pushed in 2ce9edf: update tests/unit/connectors/test_oauth_pkce.py to mock the current token-string contract and remove the stale expires_at assertion.

Local verification:

  • PYTHONPATH=src:. python -m pytest tests/unit/connectors/test_oauth_pkce.py::TestGetCredential::test_returns_token_dict_shape -q — passed
  • PYTHONPATH=src:. python -m pytest tests/unit/connectors/test_oauth_pkce.py tests/unit/connectors/test_tokens.py tests/unit/connectors/test_api.py -q — 39 passed
  • PYTHONPATH=src:. python -m pytest tests/unit/connectors/test_oauth_pkce.py -q — 18 passed
  • git diff --check

GitHub rerun status: Run Unit Tests is now passing (74825586664).

The userns-restricted AppImage smoke only needs to prove the packaged app launches with the Electron no-sandbox fallback and serves /api/health. Running gaia init inside that path pulled the test into Lemonade apt/model bootstrap, causing the job to time out before the UI readiness assertion.

Constraint: The failing CI job is a restricted AppImage launch smoke, not a Lemonade installation validation.

Rejected: Increase the 300s timeout | hides the unrelated Lemonade bootstrap dependency and leaves the smoke flaky.

Confidence: high

Scope-risk: narrow

Directive: Keep installer smoke jobs focused on packaged launch/readiness; validate full Lemonade bootstrap in Lemonade-specific CI lanes.

Tested: npm test -- test_electron_chat_installer.js --runInBand

Tested: node --check src/gaia/apps/webui/services/backend-installer.cjs

Tested: workflow YAML assertion for GAIA_SKIP_GAIA_INIT

Tested: git diff --check
@dislovelhl

Copy link
Copy Markdown
Contributor Author

@itomek Thanks for the careful re-review and re-approval.

Agreed on the remaining RTL/Playwright coverage being better handled as a follow-up rather than expanding this PR's scope. The main goal here was to preserve the policy block across the mid-stream error path and keep the persistence behaviour explicit and regression-covered.

I'll keep the optional UI-level coverage separate, so this PR stays focused on the policy/persistence fix.

@itomek

itomek commented May 7, 2026

Copy link
Copy Markdown
Collaborator

@copilot resolve the merge conflicts in this pull request

@dislovelhl

Copy link
Copy Markdown
Contributor Author

@itomek Thanks again for the careful review and re-review.

I agree with the remaining concern: this PR covers the core persistence/surfacing path, but the UI/UX side is not yet as strong as it should be. The backend and persistence behavior are now regression-covered, including the mid-stream error case, but the current Electron coverage is still mostly static/string-level. It does not fully prove the live user flow: policy toast → View receipt → notification anchor scroll.

I also recognize that the PR became larger than ideal. It now spans persistence, API response shape, React UI, notification behavior, receipt anchoring, docs, Electron assertions, and multiple base-refresh fixes. That made review harder and increased the risk of churn around shared UI files.

My plan from here:

  1. Keep this PR focused on the policy-alert durability and surfacing behavior already reviewed.
  2. Avoid expanding this PR further unless required for merge correctness.
  3. Track the remaining UI/UX quality work separately, especially RTL/Playwright coverage for the live receipt navigation flow.
  4. In the follow-up, also revisit the Policy Shield / toast / notification hierarchy so the UI feels intentional rather than just technically complete.

Appreciate the patience. The current PR is much better after your feedback, and I’ll keep the next pass smaller, cleaner, and more UI-behavior focused.

itomek
itomek previously approved these changes May 8, 2026
@itomek

itomek commented May 8, 2026

Copy link
Copy Markdown
Collaborator

@dislovelhl I tried to fix the merge conflicts but for some reason its not updating, most likely because its your fork. Please fix ASAP so we can get this merged in.

@dislovelhl dislovelhl closed this May 8, 2026
auto-merge was automatically disabled May 8, 2026 17:30

Pull request was closed

Resolve conflicts:

- tests/electron/test_electron_chat_app.js: keep PR's SettingsPage.css test (more comprehensive; merged file has all 5 symbols).

- tests/unit/connectors/test_oauth_pkce.py: take main's stricter assertion ('expires_at' not in result).
@itomek itomek added this pull request to the merge queue May 8, 2026
Merged via the queue into amd:main with commit 9149326 May 8, 2026
44 checks passed
@dislovelhl dislovelhl deleted the feat/optional-governance-layer branch May 9, 2026 05:53
@itomek itomek mentioned this pull request May 14, 2026
6 tasks
pull Bot pushed a commit to bhardwajRahul/gaia that referenced this pull request May 15, 2026
## Why this matters

v0.18.0 ships agent memory v2 (hybrid-search second brain with LLM
extraction and observability dashboard), ChatAgent split into three
composable agents (Chat/FileIO/DocumentQA), parallel tool calls, and a
Telegram adapter scaffold — plus fixes the RAG-on-PDF timeout with Gemma
4 that broke document Q&A since v0.17.6 and adds CI gates that enforce
RAG quality baselines on every future PR.

Full notes: `docs/releases/v0.18.0.mdx`.

## What's New

- **Agent memory v2** ([amd#606](amd#606)) —
Hybrid semantic + keyword search, LLM extraction, observability
dashboard via SSE streaming
([amd#1032](amd#1032)). Per-user isolation
enforced; extraction runs async so it doesn't add latency.
- **ChatAgent split** ([amd#979](amd#979)) —
`ChatAgent`, `FileIOAgent`, and `DocumentQAAgent` replace the monolithic
class; each composable via `tools=`. Backward-compatible shim preserved.
- **Parallel tool calls** ([amd#946](amd#946))
— Multiple `tool_calls` from a single LLM turn are executed
concurrently, cutting round-trips for multi-tool workflows.
- **Telegram adapter scaffold, Phase 0**
([amd#951](amd#951)) — `gaia telegram
start|stop|status`, per-user session isolation, `[telegram]` extras.
Phase 1 (message handling + allowed-users gate) tracked in
[amd#889](amd#889).
- **Connectors: per-MCP toggle + single-writer enforcement**
([amd#1018](amd#1018),
[amd#998](amd#998)) — Disable individual MCP
servers without removing them; concurrent writes serialised with
actionable errors on contention.
- **File navigation, web browsing, and write security**
([amd#495](amd#495)) — `FileSearchToolsMixin`,
web browsing tool, and scratchpad mixin in `KNOWN_TOOLS`; write tools
check `allowed_paths` before dispatch.
- **Email UI and policy alerts**
([amd#995](amd#995),
[amd#1039](amd#1039),
[amd#952](amd#952)) — Pre-scan triage card,
in-chat Connect, policy alert cards, and durable receipts for
confirmation-gated actions.

## Bug Fixes

- **RAG-on-PDF timeouts on Gemma 4**
([amd#1034](amd#1034), closes
[amd#1030](amd#1030)) — Prompt-size budget
check added at composition time; CI gates enforce it on every PR
([amd#1040](amd#1040)).
- **Envelope-level parse failure crashed SD recovery**
([amd#1047](amd#1047), closes
[amd#1023](amd#1023)) — Falls through to a
clean recovery path with step-1 context preserved.
- **Windows-path tool args corrupted**
([amd#1027](amd#1027)) — Backslash
normalisation now happens after argument parsing.
- **Blender `send_command` hung**
([amd#1026](amd#1026), closes
[amd#1022](amd#1022)) — Read timeout applied
to persistent-connection servers.
- **`gaia chat init` in post-install banner**
([amd#1029](amd#1029), closes
[amd#1024](amd#1024)) — Replaced with the
correct `gaia init`.
- **Keyring treated as required**
([amd#1028](amd#1028)) — Import guarded;
optional on systems without `keyring`.
- **electron-builder URLs stale**
([amd#953](amd#953)) — Three doc/installer
files updated to current download paths.

## Tooling & Docs

- **RAG eval CI gates** ([amd#1040](amd#1040),
closes [amd#1033](amd#1033)) — RAG quality
baselines + prompt-size budget enforced on every PR.
- **Fork-PR authors now receive Claude review**
([amd#932](amd#932)) —
`allowed_non_write_users: "*"` with prompt-injection mitigations
documented.
- **Eval runs mandated before merging**
([amd#1036](amd#1036)) — `CLAUDE.md` requires
`gaia eval agent` for LLM-affecting changes.
- **GAIA website** ([amd#369](amd#369)) —
[amd-gaia.ai](https://amd-gaia.ai) live.
- **Custom agent guide reorganised**
([amd#997](amd#997)), Lemonade PPA docs
([amd#801](amd#801)), broken Lemonade CLI URL
fixed ([amd#996](amd#996)), WhatsApp adapter
evaluation spec ([amd#950](amd#950)).

## Release checklist

- [x] `util/validate_release_notes.py docs/releases/v0.18.0.mdx --tag
v0.18.0` passes
- [x] `src/gaia/version.py` → `0.18.0`
- [x] `src/gaia/apps/webui/package.json` → `0.18.0`
- [x] Navbar label in `docs/docs.json` → `v0.18.0 · Lemonade 10.2.0`
- [x] All 28 commits in range (v0.17.6..HEAD) are represented in the
notes
- [ ] Review from @kovtcharov-amd addressed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents dependencies Dependency updates devops DevOps/infrastructure changes documentation Documentation changes electron Electron app changes tests Test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Governance: surface BLOCK decisions to the user via a policy_alert SSE event

3 participants