feat(ui): policy alert cards, notifications, and durable receipts#952
Conversation
Adds `gaia.governance` — an opt-in, additive governance package that wraps tool execution with an ACGS-lite-style action kernel and seams for constitutional-swarm workflow checkpoints / receipts / policy- version binding. Key properties - Zero edits to the base `Agent` class. `GovernedAgentMixin` overrides `_execute_tool` via `super()`; adding it to an agent costs nothing when no adapter is supplied. - Canonical tool-name resolution before governance, so unprefixed MCP aliases cannot bypass risk tags on their canonical names. - Fail-closed REVIEW: only an explicit `governance_reviewer` callback counts. The default `AgentConsole.confirm_tool_execution` auto- approves, so it is intentionally not consulted. - Envelope-bound receipt hashes cover the full evidence set (receipt id, workflow, decision, policy version, constitution hash, timestamp, evidence) with strict canonical JSON. - Workflow-bound checkpoint resolution and atomic check-and-set in the in-memory bridge. Ergonomics - `GaiaGovernanceAdapter.default(audit_log=...)` for one-line wiring with in-repo stubs. - `GovernanceConfig` dataclass consolidates six governance kwargs; per-kwarg style preserved for back-compat. - `@govern(risk="blocked", reason=...)` decorator colocates policy with tool source; explicit dict merges with decorator tags. What's here (PR 1) - Action-level governance via `GovernedAgentMixin` - Protocol interfaces: `PolicyEngine`, `CheckpointRuntime`, `ReceiptServiceProtocol`, `PolicyBindingProtocol` - In-memory + JSONL receipt services - Reference stub policy engine (`RuleBasedPolicyEngine`) - 55 unit + integration tests; pylint clean against repo `.pylintrc` - Governed weather-agent example with CLI reviewer - `src/gaia/governance/README.md` with quickstart and extension table What's deferred (PR 2+) - ACGS-lite-backed policy engine - Persistent checkpoint bridge via constitutional-swarm - Policy control plane wiring - Plan-step / multi-agent workflow transitions (the `workflow_mapper` helper is a forward-compatibility seam) Review: iterated against Codex (architecture / correctness / security) and Gemini (DX / API ergonomics / docs) advisors. All HIGH / MEDIUM findings addressed; regression tests added for each. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new governance package was missing from setup.py packages list, causing test_all_filesystem_packages_in_setup_py to fail. Black and isort were not run on the new files before commit. Constraint: Black line-length follows project default (88) Constraint: isort profile follows project default Tested: black --check, isort --check-only both pass locally Not-tested: full unit test suite (requires project deps) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…cation Fix thread safety in InMemoryCheckpointBridge.create_checkpoint by holding _lock during _records write. Guard mapping lookup in resolve_checkpoint with .get() to raise InvalidResolutionError instead of KeyError on unknown types. Add Lock to InMemoryReceiptService for concurrent access. Harden _read_all deserialization to filter to known ReceiptRecord fields, silently skipping malformed or schema-mismatched lines instead of crashing. Replace assert in _handle_review_checkpoint with GaiaGovernanceError raise (asserts are stripped with -O). Eliminate type: ignore[union-attr] by passing the already-resolved adapter as an explicit parameter. Make handle_transition REVIEW branch explicit with elif + raise GaiaGovernanceError on unknown decision types instead of implicit fallthrough. Remove duplicate GovernanceCallback / GovernanceReviewer aliases: define once in config.py with specific types (ActionRequest, GovernanceDecision) and import in mixin.py. Confidence: high Scope-risk: narrow Tested: black+isort clean, syntax verified Not-tested: full test suite (no test runner available locally) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
test_tool_decorator.py has an autouse fixture that calls _TOOL_REGISTRY.clear() before and after each test. When it runs before test_governance_dx.py in CI, _dx_decorated_blocked is no longer in the registry so _lookup_tool_fn returns None and the two tests that depend purely on decorated tags (test_mixin_reads_decorated_tags_from_registry and test_explicit_dict_overrides_decorated_tags) see an ALLOW decision instead of BLOCK. Fix: add an autouse fixture in test_governance_dx.py that re-registers the test tools if the registry was cleared between collection and execution. Constraint: _TOOL_REGISTRY is a module-level mutable global; test isolation must be explicit when multiple suites share it. Tested: test_tool_decorator.py + test_governance_dx.py sequentially (16 passed) Not-tested: concurrent xdist workers (not used in this CI) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Black reformatted 5 test files that were introduced without running the formatter first: test_governance_dx.py (also picks up the autouse-fixture added in the previous commit), test_governance_adapter.py, test_governance_jsonl_receipts.py, test_governance_schemas.py, test_governance_receipts.py. No logic changes. Tested: 40 governance unit tests pass locally Scope-risk: narrow Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remaining 6 files flagged by CI Black check: - tests/integration/test_governed_*.py (5 governance integration tests) - src/gaia/mcp/mcp_bridge.py No logic changes. Scope-risk: narrow Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes the merge blockers from PR review: - Tighten exception scopes in mixin and receipt service. Replace blanket `except Exception: pass` with specific exception types and `logger.warning` for the unexpected case. Most importantly, `_resolve_canonical_tool_name` now logs unexpected resolver errors instead of silently falling through to the raw name — closing the alias-bypass risk where governance could check tags on the wrong key. - Correct documentation: tag merge is additive (union, deduplicated), not "explicit dict wins". README, decorator docstring, and mixin comment now match the behavior tests have always asserted. - Strict canonical JSON for BLOCK-receipt evidence: handle non-JSON tool args, complex types, and cycles without falling back to repr(). - Strict canonical JSON in JsonlReceiptService.issue_receipt: reject non-canonical metadata (NaN/Inf, opaque objects) at issue time instead of allowing tampered receipts to land in the audit log. - Register the governance SDK in public docs: new `docs/sdk/sdks/governance.mdx` and an entry in `docs/docs.json`.
The API Tests job has no `timeout-minutes`, so a hung Lemonade server or stalled model pull leaves the runner spinning indefinitely (a 4-hour no-op happened on PR amd#921). 30 min comfortably covers the worst-case sequential path: 60s server start + 10min model pull + 2min model load + 30s API server start + test run.
…p-copy tags Final polish on top of the merge-blocker fixes. Reviewer feedback from a parallel code/architecture audit converged on these items: - Delete `workflow_mapper.py` and `StaticPolicyBindingService.bind_receipt`. Both are advertised as forward-compat seams but have zero callers in src/, tests/, examples/, or docs/. Re-introduce them in the PR that actually wires the new event surface, with the real signature in hand. - Tighten `JsonlReceiptService.get_receipt`: cache reads and writes were unsynchronized while a concurrent `issue_receipt` was mutating the same dict under `_lock`. Move the cache check + install under the lock. - Add a `logger.debug` breadcrumb for malformed-line skips in `_read_all` so an operator chasing a missing receipt has something to grep. - Deep-copy inner risk-tag lists in `GovernedAgentMixin.__init__` so a caller cannot mutate the agent's tag table after construction by holding onto the original list reference. - Add a comment in `_canonical_json_value` documenting why `bool` is checked before `int` (subclass relationship — without the ordering, `True` would canonicalize as `1`). - README: drop the `workflow_mapper` mention from "What's not here yet" now that the seam is gone.
Adds tests for the previously-uncovered branches surfaced by the
test-coverage audit. Each test guards against a specific regression:
- `test_resolver_unexpected_exception_logs_and_governs_raw_name` — proves
a buggy `_resolve_tool_name` that raises an unexpected exception still
triggers governance on the raw name AND emits an operator-visible
warning. Future regression where the warning is swapped for a silent
fallback fails this test.
- `test_resolver_lookup_error_is_silent_and_governs_raw_name` — proves
the expected "tool not in registry" case (LookupError) is absorbed
silently with no log noise.
- `test_unknown_transition_outcome_fails_closed` — proves a custom
`CheckpointRuntime` returning a status the mixin doesn't know is
denied, not let through.
- `test_handle_transition_rejects_unknown_decision_type` — same idea at
the adapter layer for an unknown `GovernanceDecision.decision`.
- `test_read_all_skips_malformed_lines` — proves a corrupt line in the
middle of an audit log doesn't block readers from finding subsequent
valid records.
- Existing callback-exception and reviewer-exception tests gain caplog
assertions so a future silent-swallow regression is caught.
Plus two readability fixes:
- Rename `test_explicit_dict_overrides_decorated_tags` to
`test_explicit_empty_dict_does_not_downgrade_decorator_tags` — the
body asserts additive semantics; the old name said the opposite.
- Replace hardcoded `"test_governance_adapter.SlotOnlyEvidence"`
qualname strings with `f"{Cls.__module__}.{Cls.__qualname__}"` so the
tests survive a file rename.
…islovelhl/gaia-acgs into feat/optional-governance-layer
… log `_prompt_review` now returns `(approved, exception_or_None)` instead of just `approved`. When a reviewer raises, `_handle_review_checkpoint` stamps the exception type and message into `CheckpointResolution.reason` so the receipt metadata records "reviewer raised RuntimeError: bad reviewer" rather than the boilerplate "reviewer rejected" — which previously made the audit log unable to tell a deliberate "no" from a crash. The operator-facing `logger.warning` was already in place; this commit closes the audit-trail gap so downstream consumers (compliance, forensics, retros) can distinguish the two without grepping operator logs. Adds two tests: - `test_reviewer_exception_is_treated_as_reject` extended to assert the receipt's `metadata.evidence.resolution.reason` contains the exception type and message - new `test_reviewer_explicit_no_keeps_plain_reason` — a reviewer that returns False produces a plain "reviewer rejected" reason, not an exception-flavored one (regression guard against false positives)
The Claude AI Assistant workflow runs the claude-code-action which requires ANTHROPIC_API_KEY. That secret is only configured on the canonical amd/gaia repo, so every fork without the secret hits a hard failure on PR-review, issue-handler, and release-notes events. Add `github.repository == 'amd/gaia'` to each job's `if:` so the workflow no-ops on forks rather than failing red. Forks can still opt-in by setting their own ANTHROPIC_API_KEY and removing the guard, but the default is silent skip. Tested by re-running PR #3 on dislovelhl/gaia-acgs after this commit: all four jobs should report `Skipped` instead of failing.
Addresses architectural feedback on amd#921 (review 4197475871). Governance REVIEW now reuses GAIA's existing blocking confirmation flow when the active console advertises it, instead of running as a parallel enforcement path that silently fails closed. - OutputHandler grows a `blocking_confirmation: bool = False` capability flag; SSEOutputHandler sets it to True (it already blocks on the frontend permission modal). - _prompt_review precedence: explicit governance_reviewer wins; else delegate to console.confirm_tool_execution iff the console advertises blocking_confirmation; else fail closed. The console is resolved per call, not captured at __init__. - The default console still returns True immediately, so CLI without an explicit reviewer continues to fail closed (no auto-approve). Test coverage: - tests/integration/test_governed_review_flow.py — @govern(risk=review) + SSEOutputHandler emits permission_request, deny resolves the checkpoint, denied tool body never executes, non-blocking consoles fail closed, audit receipt distinguishes REVIEW_REJECTED from BLOCK. - tests/unit/chat/ui/test_sse_confirmation.py — handshake coverage for approve/deny/timeout/cancel. Documents the relationship to confirm_tool_execution in docs/sdk/sdks/governance.mdx and src/gaia/governance/README.md so the "which mechanism shows a UI prompt?" answer is no longer ambiguous. The legacy TOOLS_REQUIRING_CONFIRMATION set is intentionally untouched in this commit; unifying the pipeline is staged for follow-up PRs.
Today GovernedAgentMixin returns a denied dict on BLOCK and the end user can't tell a policy refusal from a generic tool failure. This adds a structured policy_alert event so the Agent UI can surface a "blocked by policy" notification with the audit receipt id. - OutputHandler grows print_policy_alert() (default no-op for headless consoles). - SSEOutputHandler.print_policy_alert emits a typed event onto the SSE queue with tool, decision, reason, rule_ids, policy_version, and the audit receipt_id when present. Tool args are deliberately excluded — receipt_id is the safe correlator for deep-linking back to the audit. - GovernedAgentMixin._emit_policy_alert is called immediately before the denied result is returned. The receipt_id surfaced in the SSE event is the same id stored by the receipt service, so the alert and the audit log link 1:1. Emission failures are logged (warning, exc_info) and swallowed so a UI bug can never break governance. - Frontend StreamEvent type union grows policy_alert + the new optional fields. Rendering (toast, inline card, "view receipt" route) ships in PR-2 once the receipt-viewer UX is decided. Tests: - tests/unit/chat/ui/test_sse_handler.py — exact event shape, including the omits-receipt-id-when-None case. - tests/integration/test_governed_review_flow.py — full BLOCK path through SSEOutputHandler asserts denied result, agent.calls == [] (tool body never executes), audit receipt persisted, and the alert event's receipt_id matches the denied result's receipt_id. Docs: - docs/sdk/sdks/governance.mdx + src/gaia/governance/README.md document the event shape and intended UI consumption. - docs/sdk/sdks/agent-ui.mdx links the event into the SSE event reference. Stacked on feat/governance-review-bridge (PR #3) which lands the Path A capability bridge for governance REVIEW.
…ckend feat(governance): emit policy_alert SSE event when BLOCK denies a tool
feat(governance): bridge REVIEW to existing console confirmation surface
Inherited via main merge from amd#919, which introduced both a regression test asserting AttributeError must surface and a broad-except wrapper that swallowed it. Same commit, opposite intent — removing the except satisfies the test, the docstring ("Ratchets the Apr-20 review fix"), and the project's no-silent-fallbacks rule. Also drops one stray blank line in the test file to satisfy black. Unblocks PR 921 CI (Code Quality + Unit Tests checks).
Flake8 E731 — "do not assign a lambda expression, use a def" — flagged the inline reviewer fallback in mixin.py:348. Behaviour is identical; just satisfies the project's lint contract.
Persist policy BLOCK steps through reload and disconnect so policy receipts remain visible in the Agent UI. Render alerts as non-actionable notifications, toast links, and inline Policy Shield activity cards. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The email-agent merge changed OAuthPkceHandler.get_credential() to match get_or_refresh(), which returns a token string. The unit test still mocked the old tuple shape and asserted expires_at, causing the Run Unit Tests job to fail on PR amd#952. Constraint: GitHub Actions failure was tests/unit/connectors/test_oauth_pkce.py::TestGetCredential::test_returns_token_dict_shape. Rejected: Restore tuple unpacking in OAuthPkceHandler | get_or_refresh() is typed and implemented as returning a string, and get_access_token() callers depend on that contract. Confidence: high Scope-risk: narrow Tested: PYTHONPATH=src:. python -m pytest tests/unit/connectors/test_oauth_pkce.py::TestGetCredential::test_returns_token_dict_shape -q Tested: PYTHONPATH=src:. python -m pytest tests/unit/connectors/test_oauth_pkce.py tests/unit/connectors/test_tokens.py tests/unit/connectors/test_api.py -q Tested: PYTHONPATH=src:. python -m pytest tests/unit/connectors/test_oauth_pkce.py -q Tested: git diff --check
Head branch was pushed to by a user without write access
itomek
left a comment
There was a problem hiding this comment.
Verified at HEAD (9c3005a):
- The prior approval was auto-dismissed by the two merge commits today. Both are responsive resolutions to main moving (#965 email agent, #967 mac uv, #969 settings cards) and don't change behaviour outside the conflict resolution.
- The merges also pick up 2 of the 3 optional follow-ups from the prior approval — the
step["active"] = Falseredundancy in_persist_policy_block_if_needed()is gone, and the single-writer SQLite caveat is now noted around the delete-then-add path. - A new regression test was added for the "follow-up I'd write next" scenario from the prior review —
policy_alertemitted, stream errors mid-response, persisted block survives. Exactly the right test to add here. - CI: build matrix green (windows/linux/macos installers, structural smoke, electron framework, all
build-apps); Linux full-integration + AppImage matrices still in flight, no failures so far.pr-reviewskipped — known state per #967's commit message about the bot's API credits.
Re-approving. The remaining RTL/Playwright follow-up is correctly tracked in the author's last comment as out-of-scope for this PR.
Generated by Claude Code
|
Fixed the failed Root cause: the newer email-agent base changed Fix pushed in Local verification:
GitHub rerun status: |
The userns-restricted AppImage smoke only needs to prove the packaged app launches with the Electron no-sandbox fallback and serves /api/health. Running gaia init inside that path pulled the test into Lemonade apt/model bootstrap, causing the job to time out before the UI readiness assertion. Constraint: The failing CI job is a restricted AppImage launch smoke, not a Lemonade installation validation. Rejected: Increase the 300s timeout | hides the unrelated Lemonade bootstrap dependency and leaves the smoke flaky. Confidence: high Scope-risk: narrow Directive: Keep installer smoke jobs focused on packaged launch/readiness; validate full Lemonade bootstrap in Lemonade-specific CI lanes. Tested: npm test -- test_electron_chat_installer.js --runInBand Tested: node --check src/gaia/apps/webui/services/backend-installer.cjs Tested: workflow YAML assertion for GAIA_SKIP_GAIA_INIT Tested: git diff --check
|
@itomek Thanks for the careful re-review and re-approval. Agreed on the remaining RTL/Playwright coverage being better handled as a follow-up rather than expanding this PR's scope. The main goal here was to preserve the policy block across the mid-stream error path and keep the persistence behaviour explicit and regression-covered. I'll keep the optional UI-level coverage separate, so this PR stays focused on the policy/persistence fix. |
|
@copilot resolve the merge conflicts in this pull request |
|
@itomek Thanks again for the careful review and re-review. I agree with the remaining concern: this PR covers the core persistence/surfacing path, but the UI/UX side is not yet as strong as it should be. The backend and persistence behavior are now regression-covered, including the mid-stream error case, but the current Electron coverage is still mostly static/string-level. It does not fully prove the live user flow: policy toast → View receipt → notification anchor scroll. I also recognize that the PR became larger than ideal. It now spans persistence, API response shape, React UI, notification behavior, receipt anchoring, docs, Electron assertions, and multiple base-refresh fixes. That made review harder and increased the risk of churn around shared UI files. My plan from here:
Appreciate the patience. The current PR is much better after your feedback, and I’ll keep the next pass smaller, cleaner, and more UI-behavior focused. |
|
@dislovelhl I tried to fix the merge conflicts but for some reason its not updating, most likely because its your fork. Please fix ASAP so we can get this merged in. |
Resolve conflicts:
- tests/electron/test_electron_chat_app.js: keep PR's SettingsPage.css test (more comprehensive; merged file has all 5 symbols).
- tests/unit/connectors/test_oauth_pkce.py: take main's stricter assertion ('expires_at' not in result).
## Why this matters v0.18.0 ships agent memory v2 (hybrid-search second brain with LLM extraction and observability dashboard), ChatAgent split into three composable agents (Chat/FileIO/DocumentQA), parallel tool calls, and a Telegram adapter scaffold — plus fixes the RAG-on-PDF timeout with Gemma 4 that broke document Q&A since v0.17.6 and adds CI gates that enforce RAG quality baselines on every future PR. Full notes: `docs/releases/v0.18.0.mdx`. ## What's New - **Agent memory v2** ([amd#606](amd#606)) — Hybrid semantic + keyword search, LLM extraction, observability dashboard via SSE streaming ([amd#1032](amd#1032)). Per-user isolation enforced; extraction runs async so it doesn't add latency. - **ChatAgent split** ([amd#979](amd#979)) — `ChatAgent`, `FileIOAgent`, and `DocumentQAAgent` replace the monolithic class; each composable via `tools=`. Backward-compatible shim preserved. - **Parallel tool calls** ([amd#946](amd#946)) — Multiple `tool_calls` from a single LLM turn are executed concurrently, cutting round-trips for multi-tool workflows. - **Telegram adapter scaffold, Phase 0** ([amd#951](amd#951)) — `gaia telegram start|stop|status`, per-user session isolation, `[telegram]` extras. Phase 1 (message handling + allowed-users gate) tracked in [amd#889](amd#889). - **Connectors: per-MCP toggle + single-writer enforcement** ([amd#1018](amd#1018), [amd#998](amd#998)) — Disable individual MCP servers without removing them; concurrent writes serialised with actionable errors on contention. - **File navigation, web browsing, and write security** ([amd#495](amd#495)) — `FileSearchToolsMixin`, web browsing tool, and scratchpad mixin in `KNOWN_TOOLS`; write tools check `allowed_paths` before dispatch. - **Email UI and policy alerts** ([amd#995](amd#995), [amd#1039](amd#1039), [amd#952](amd#952)) — Pre-scan triage card, in-chat Connect, policy alert cards, and durable receipts for confirmation-gated actions. ## Bug Fixes - **RAG-on-PDF timeouts on Gemma 4** ([amd#1034](amd#1034), closes [amd#1030](amd#1030)) — Prompt-size budget check added at composition time; CI gates enforce it on every PR ([amd#1040](amd#1040)). - **Envelope-level parse failure crashed SD recovery** ([amd#1047](amd#1047), closes [amd#1023](amd#1023)) — Falls through to a clean recovery path with step-1 context preserved. - **Windows-path tool args corrupted** ([amd#1027](amd#1027)) — Backslash normalisation now happens after argument parsing. - **Blender `send_command` hung** ([amd#1026](amd#1026), closes [amd#1022](amd#1022)) — Read timeout applied to persistent-connection servers. - **`gaia chat init` in post-install banner** ([amd#1029](amd#1029), closes [amd#1024](amd#1024)) — Replaced with the correct `gaia init`. - **Keyring treated as required** ([amd#1028](amd#1028)) — Import guarded; optional on systems without `keyring`. - **electron-builder URLs stale** ([amd#953](amd#953)) — Three doc/installer files updated to current download paths. ## Tooling & Docs - **RAG eval CI gates** ([amd#1040](amd#1040), closes [amd#1033](amd#1033)) — RAG quality baselines + prompt-size budget enforced on every PR. - **Fork-PR authors now receive Claude review** ([amd#932](amd#932)) — `allowed_non_write_users: "*"` with prompt-injection mitigations documented. - **Eval runs mandated before merging** ([amd#1036](amd#1036)) — `CLAUDE.md` requires `gaia eval agent` for LLM-affecting changes. - **GAIA website** ([amd#369](amd#369)) — [amd-gaia.ai](https://amd-gaia.ai) live. - **Custom agent guide reorganised** ([amd#997](amd#997)), Lemonade PPA docs ([amd#801](amd#801)), broken Lemonade CLI URL fixed ([amd#996](amd#996)), WhatsApp adapter evaluation spec ([amd#950](amd#950)). ## Release checklist - [x] `util/validate_release_notes.py docs/releases/v0.18.0.mdx --tag v0.18.0` passes - [x] `src/gaia/version.py` → `0.18.0` - [x] `src/gaia/apps/webui/package.json` → `0.18.0` - [x] Navbar label in `docs/docs.json` → `v0.18.0 · Lemonade 10.2.0` - [x] All 28 commits in range (v0.17.6..HEAD) are represented in the notes - [ ] Review from @kovtcharov-amd addressed
Closes #925
Policy BLOCK decisions were already emitted as
policy_alertSSE events, but Agent UI users could not reliably see, understand, or revisit those decisions. This PR turns those backend policy events into visible Policy Shield activity cards, critical notifications/toasts, and persisted receipt details so blocked tool calls stay clear across reloads.Threads
Test plan
cd src/gaia/apps/webui && npm run buildpython -m pytest tests/unit/chat/ui/test_utils_helpers.py::TestMessageToResponse::test_agent_steps_preserves_policy_alert_fields tests/integration/test_chat_ui_integration.py -k policy_alert -qgit diff --checkcd tests/electron && CI=true npm test -- --runInBand