feat(ui): policy alert cards, notifications, and durable receipts#952
feat(ui): policy alert cards, notifications, and durable receipts#952dislovelhl wants to merge 26 commits intoamd:mainfrom
Conversation
Adds `gaia.governance` — an opt-in, additive governance package that wraps tool execution with an ACGS-lite-style action kernel and seams for constitutional-swarm workflow checkpoints / receipts / policy- version binding. Key properties - Zero edits to the base `Agent` class. `GovernedAgentMixin` overrides `_execute_tool` via `super()`; adding it to an agent costs nothing when no adapter is supplied. - Canonical tool-name resolution before governance, so unprefixed MCP aliases cannot bypass risk tags on their canonical names. - Fail-closed REVIEW: only an explicit `governance_reviewer` callback counts. The default `AgentConsole.confirm_tool_execution` auto- approves, so it is intentionally not consulted. - Envelope-bound receipt hashes cover the full evidence set (receipt id, workflow, decision, policy version, constitution hash, timestamp, evidence) with strict canonical JSON. - Workflow-bound checkpoint resolution and atomic check-and-set in the in-memory bridge. Ergonomics - `GaiaGovernanceAdapter.default(audit_log=...)` for one-line wiring with in-repo stubs. - `GovernanceConfig` dataclass consolidates six governance kwargs; per-kwarg style preserved for back-compat. - `@govern(risk="blocked", reason=...)` decorator colocates policy with tool source; explicit dict merges with decorator tags. What's here (PR 1) - Action-level governance via `GovernedAgentMixin` - Protocol interfaces: `PolicyEngine`, `CheckpointRuntime`, `ReceiptServiceProtocol`, `PolicyBindingProtocol` - In-memory + JSONL receipt services - Reference stub policy engine (`RuleBasedPolicyEngine`) - 55 unit + integration tests; pylint clean against repo `.pylintrc` - Governed weather-agent example with CLI reviewer - `src/gaia/governance/README.md` with quickstart and extension table What's deferred (PR 2+) - ACGS-lite-backed policy engine - Persistent checkpoint bridge via constitutional-swarm - Policy control plane wiring - Plan-step / multi-agent workflow transitions (the `workflow_mapper` helper is a forward-compatibility seam) Review: iterated against Codex (architecture / correctness / security) and Gemini (DX / API ergonomics / docs) advisors. All HIGH / MEDIUM findings addressed; regression tests added for each. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new governance package was missing from setup.py packages list, causing test_all_filesystem_packages_in_setup_py to fail. Black and isort were not run on the new files before commit. Constraint: Black line-length follows project default (88) Constraint: isort profile follows project default Tested: black --check, isort --check-only both pass locally Not-tested: full unit test suite (requires project deps) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…cation Fix thread safety in InMemoryCheckpointBridge.create_checkpoint by holding _lock during _records write. Guard mapping lookup in resolve_checkpoint with .get() to raise InvalidResolutionError instead of KeyError on unknown types. Add Lock to InMemoryReceiptService for concurrent access. Harden _read_all deserialization to filter to known ReceiptRecord fields, silently skipping malformed or schema-mismatched lines instead of crashing. Replace assert in _handle_review_checkpoint with GaiaGovernanceError raise (asserts are stripped with -O). Eliminate type: ignore[union-attr] by passing the already-resolved adapter as an explicit parameter. Make handle_transition REVIEW branch explicit with elif + raise GaiaGovernanceError on unknown decision types instead of implicit fallthrough. Remove duplicate GovernanceCallback / GovernanceReviewer aliases: define once in config.py with specific types (ActionRequest, GovernanceDecision) and import in mixin.py. Confidence: high Scope-risk: narrow Tested: black+isort clean, syntax verified Not-tested: full test suite (no test runner available locally) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
test_tool_decorator.py has an autouse fixture that calls _TOOL_REGISTRY.clear() before and after each test. When it runs before test_governance_dx.py in CI, _dx_decorated_blocked is no longer in the registry so _lookup_tool_fn returns None and the two tests that depend purely on decorated tags (test_mixin_reads_decorated_tags_from_registry and test_explicit_dict_overrides_decorated_tags) see an ALLOW decision instead of BLOCK. Fix: add an autouse fixture in test_governance_dx.py that re-registers the test tools if the registry was cleared between collection and execution. Constraint: _TOOL_REGISTRY is a module-level mutable global; test isolation must be explicit when multiple suites share it. Tested: test_tool_decorator.py + test_governance_dx.py sequentially (16 passed) Not-tested: concurrent xdist workers (not used in this CI) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Black reformatted 5 test files that were introduced without running the formatter first: test_governance_dx.py (also picks up the autouse-fixture added in the previous commit), test_governance_adapter.py, test_governance_jsonl_receipts.py, test_governance_schemas.py, test_governance_receipts.py. No logic changes. Tested: 40 governance unit tests pass locally Scope-risk: narrow Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remaining 6 files flagged by CI Black check: - tests/integration/test_governed_*.py (5 governance integration tests) - src/gaia/mcp/mcp_bridge.py No logic changes. Scope-risk: narrow Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes the merge blockers from PR review: - Tighten exception scopes in mixin and receipt service. Replace blanket `except Exception: pass` with specific exception types and `logger.warning` for the unexpected case. Most importantly, `_resolve_canonical_tool_name` now logs unexpected resolver errors instead of silently falling through to the raw name — closing the alias-bypass risk where governance could check tags on the wrong key. - Correct documentation: tag merge is additive (union, deduplicated), not "explicit dict wins". README, decorator docstring, and mixin comment now match the behavior tests have always asserted. - Strict canonical JSON for BLOCK-receipt evidence: handle non-JSON tool args, complex types, and cycles without falling back to repr(). - Strict canonical JSON in JsonlReceiptService.issue_receipt: reject non-canonical metadata (NaN/Inf, opaque objects) at issue time instead of allowing tampered receipts to land in the audit log. - Register the governance SDK in public docs: new `docs/sdk/sdks/governance.mdx` and an entry in `docs/docs.json`.
The API Tests job has no `timeout-minutes`, so a hung Lemonade server or stalled model pull leaves the runner spinning indefinitely (a 4-hour no-op happened on PR amd#921). 30 min comfortably covers the worst-case sequential path: 60s server start + 10min model pull + 2min model load + 30s API server start + test run.
…p-copy tags Final polish on top of the merge-blocker fixes. Reviewer feedback from a parallel code/architecture audit converged on these items: - Delete `workflow_mapper.py` and `StaticPolicyBindingService.bind_receipt`. Both are advertised as forward-compat seams but have zero callers in src/, tests/, examples/, or docs/. Re-introduce them in the PR that actually wires the new event surface, with the real signature in hand. - Tighten `JsonlReceiptService.get_receipt`: cache reads and writes were unsynchronized while a concurrent `issue_receipt` was mutating the same dict under `_lock`. Move the cache check + install under the lock. - Add a `logger.debug` breadcrumb for malformed-line skips in `_read_all` so an operator chasing a missing receipt has something to grep. - Deep-copy inner risk-tag lists in `GovernedAgentMixin.__init__` so a caller cannot mutate the agent's tag table after construction by holding onto the original list reference. - Add a comment in `_canonical_json_value` documenting why `bool` is checked before `int` (subclass relationship — without the ordering, `True` would canonicalize as `1`). - README: drop the `workflow_mapper` mention from "What's not here yet" now that the seam is gone.
Adds tests for the previously-uncovered branches surfaced by the
test-coverage audit. Each test guards against a specific regression:
- `test_resolver_unexpected_exception_logs_and_governs_raw_name` — proves
a buggy `_resolve_tool_name` that raises an unexpected exception still
triggers governance on the raw name AND emits an operator-visible
warning. Future regression where the warning is swapped for a silent
fallback fails this test.
- `test_resolver_lookup_error_is_silent_and_governs_raw_name` — proves
the expected "tool not in registry" case (LookupError) is absorbed
silently with no log noise.
- `test_unknown_transition_outcome_fails_closed` — proves a custom
`CheckpointRuntime` returning a status the mixin doesn't know is
denied, not let through.
- `test_handle_transition_rejects_unknown_decision_type` — same idea at
the adapter layer for an unknown `GovernanceDecision.decision`.
- `test_read_all_skips_malformed_lines` — proves a corrupt line in the
middle of an audit log doesn't block readers from finding subsequent
valid records.
- Existing callback-exception and reviewer-exception tests gain caplog
assertions so a future silent-swallow regression is caught.
Plus two readability fixes:
- Rename `test_explicit_dict_overrides_decorated_tags` to
`test_explicit_empty_dict_does_not_downgrade_decorator_tags` — the
body asserts additive semantics; the old name said the opposite.
- Replace hardcoded `"test_governance_adapter.SlotOnlyEvidence"`
qualname strings with `f"{Cls.__module__}.{Cls.__qualname__}"` so the
tests survive a file rename.
…islovelhl/gaia-acgs into feat/optional-governance-layer
… log `_prompt_review` now returns `(approved, exception_or_None)` instead of just `approved`. When a reviewer raises, `_handle_review_checkpoint` stamps the exception type and message into `CheckpointResolution.reason` so the receipt metadata records "reviewer raised RuntimeError: bad reviewer" rather than the boilerplate "reviewer rejected" — which previously made the audit log unable to tell a deliberate "no" from a crash. The operator-facing `logger.warning` was already in place; this commit closes the audit-trail gap so downstream consumers (compliance, forensics, retros) can distinguish the two without grepping operator logs. Adds two tests: - `test_reviewer_exception_is_treated_as_reject` extended to assert the receipt's `metadata.evidence.resolution.reason` contains the exception type and message - new `test_reviewer_explicit_no_keeps_plain_reason` — a reviewer that returns False produces a plain "reviewer rejected" reason, not an exception-flavored one (regression guard against false positives)
The Claude AI Assistant workflow runs the claude-code-action which requires ANTHROPIC_API_KEY. That secret is only configured on the canonical amd/gaia repo, so every fork without the secret hits a hard failure on PR-review, issue-handler, and release-notes events. Add `github.repository == 'amd/gaia'` to each job's `if:` so the workflow no-ops on forks rather than failing red. Forks can still opt-in by setting their own ANTHROPIC_API_KEY and removing the guard, but the default is silent skip. Tested by re-running PR #3 on dislovelhl/gaia-acgs after this commit: all four jobs should report `Skipped` instead of failing.
Addresses architectural feedback on amd#921 (review 4197475871). Governance REVIEW now reuses GAIA's existing blocking confirmation flow when the active console advertises it, instead of running as a parallel enforcement path that silently fails closed. - OutputHandler grows a `blocking_confirmation: bool = False` capability flag; SSEOutputHandler sets it to True (it already blocks on the frontend permission modal). - _prompt_review precedence: explicit governance_reviewer wins; else delegate to console.confirm_tool_execution iff the console advertises blocking_confirmation; else fail closed. The console is resolved per call, not captured at __init__. - The default console still returns True immediately, so CLI without an explicit reviewer continues to fail closed (no auto-approve). Test coverage: - tests/integration/test_governed_review_flow.py — @govern(risk=review) + SSEOutputHandler emits permission_request, deny resolves the checkpoint, denied tool body never executes, non-blocking consoles fail closed, audit receipt distinguishes REVIEW_REJECTED from BLOCK. - tests/unit/chat/ui/test_sse_confirmation.py — handshake coverage for approve/deny/timeout/cancel. Documents the relationship to confirm_tool_execution in docs/sdk/sdks/governance.mdx and src/gaia/governance/README.md so the "which mechanism shows a UI prompt?" answer is no longer ambiguous. The legacy TOOLS_REQUIRING_CONFIRMATION set is intentionally untouched in this commit; unifying the pipeline is staged for follow-up PRs.
Today GovernedAgentMixin returns a denied dict on BLOCK and the end user can't tell a policy refusal from a generic tool failure. This adds a structured policy_alert event so the Agent UI can surface a "blocked by policy" notification with the audit receipt id. - OutputHandler grows print_policy_alert() (default no-op for headless consoles). - SSEOutputHandler.print_policy_alert emits a typed event onto the SSE queue with tool, decision, reason, rule_ids, policy_version, and the audit receipt_id when present. Tool args are deliberately excluded — receipt_id is the safe correlator for deep-linking back to the audit. - GovernedAgentMixin._emit_policy_alert is called immediately before the denied result is returned. The receipt_id surfaced in the SSE event is the same id stored by the receipt service, so the alert and the audit log link 1:1. Emission failures are logged (warning, exc_info) and swallowed so a UI bug can never break governance. - Frontend StreamEvent type union grows policy_alert + the new optional fields. Rendering (toast, inline card, "view receipt" route) ships in PR-2 once the receipt-viewer UX is decided. Tests: - tests/unit/chat/ui/test_sse_handler.py — exact event shape, including the omits-receipt-id-when-None case. - tests/integration/test_governed_review_flow.py — full BLOCK path through SSEOutputHandler asserts denied result, agent.calls == [] (tool body never executes), audit receipt persisted, and the alert event's receipt_id matches the denied result's receipt_id. Docs: - docs/sdk/sdks/governance.mdx + src/gaia/governance/README.md document the event shape and intended UI consumption. - docs/sdk/sdks/agent-ui.mdx links the event into the SSE event reference. Stacked on feat/governance-review-bridge (PR #3) which lands the Path A capability bridge for governance REVIEW.
…ckend feat(governance): emit policy_alert SSE event when BLOCK denies a tool
feat(governance): bridge REVIEW to existing console confirmation surface
Inherited via main merge from amd#919, which introduced both a regression test asserting AttributeError must surface and a broad-except wrapper that swallowed it. Same commit, opposite intent — removing the except satisfies the test, the docstring ("Ratchets the Apr-20 review fix"), and the project's no-silent-fallbacks rule. Also drops one stray blank line in the test file to satisfy black. Unblocks PR 921 CI (Code Quality + Unit Tests checks).
Flake8 E731 — "do not assign a lambda expression, use a def" — flagged the inline reviewer fallback in mixin.py:348. Behaviour is identical; just satisfies the project's lint contract.
Persist policy BLOCK steps through reload and disconnect so policy receipts remain visible in the Agent UI. Render alerts as non-actionable notifications, toast links, and inline Policy Shield activity cards. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolve PR amd#952 conflicts by keeping the SettingsPage migration from main while preserving policy alert notifications and tests. Sync Agent UI package metadata with the merged GAIA version. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Resolved the merge conflicts with What changed in the resolution:
Validation run locally:
GitHub now reports the PR as mergeable; remaining |
There was a problem hiding this comment.
Pull request overview
This PR adds a first-class “policy alert” (governance) event/step that can be streamed to the UI, persisted in chat history, and surfaced via dedicated UI affordances (agent activity cards, notifications, and a receipt-focused toast).
Changes:
- Persist
policy_alertagent steps (including decision/reason/rule IDs/policy version/receipt ID) and ensure policy-only BLOCK streams are reloadable from the messages API. - Update the web UI to recognize
policy_alertevents, render “Policy Shield” activity cards, and add a notification center with policy filtering + receipt anchoring. - Expand unit/integration/electron test coverage for policy alerts and update async test helpers to use
asyncio.run.
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/chat/ui/test_utils_helpers.py | Adds unit test ensuring policy alert fields survive message_to_response. |
| tests/unit/chat/ui/test_history_limits.py | Switches _run_sync helper to asyncio.run. |
| tests/unit/chat/ui/test_chat_helpers_model_resolution.py | Switches _run_sync helper to asyncio.run. |
| tests/integration/test_chat_ui_integration.py | Adds integration coverage for policy-only blocks, multi-block streams, disconnect persistence, and dedupe behavior. |
| tests/electron/test_electron_chat_app.js | Adds/updates string-based UI coverage assertions for policy alert routing and UI elements. |
| tests/electron/test_agent_process_manager.js | Adjusts test config and timer cleanup to avoid pending-RPC timer leaks/flakes. |
| src/gaia/ui/models.py | Extends AgentStepResponse with policy_alert fields for persistence/API responses. |
| src/gaia/ui/_chat_helpers.py | Captures policy_alert events during streaming and persists policy-only BLOCK responses for reloadability. |
| src/gaia/apps/webui/src/types/index.ts | Extends AgentStep type with policy_alert and related metadata fields. |
| src/gaia/apps/webui/src/types/agent.ts | Extends notification types/model to include policy_alert metadata. |
| src/gaia/apps/webui/src/styles/index.css | Adds global styles for the notification center trigger/popover. |
| src/gaia/apps/webui/src/services/api.ts | Routes policy_alert SSE events through the agent-event callback path. |
| src/gaia/apps/webui/src/components/NotificationCenter.tsx | Adds policy alert rendering and receipt anchoring/filter tab. |
| src/gaia/apps/webui/src/components/NotificationCenter.css | Styles policy notifications and policy detail blocks. |
| src/gaia/apps/webui/src/components/ChatView.tsx | Handles policy_alert events (steps + notifications + toast) and adds notification-center trigger UI. |
| src/gaia/apps/webui/src/components/ChatView.css | Adds styling for the policy alert toast. |
| src/gaia/apps/webui/src/components/AgentActivity.tsx | Renders policy alert “Policy Shield” cards and updates summary behavior. |
| src/gaia/apps/webui/src/components/AgentActivity.css | Styles policy alert cards and policy-aware summary bar state. |
| src/gaia/apps/webui/src/App.tsx | Mounts the notification center popover controlled by the notification store. |
| src/gaia/apps/webui/package.json | Bumps UI version and adds make script alias. |
| src/gaia/apps/webui/package-lock.json | Updates lockfile version metadata to match package version bump. |
| docs/guides/agent-ui.mdx | Documents policy alerts/receipts behavior in the Agent UI guide. |
Files not reviewed (1)
- src/gaia/apps/webui/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ### Policy Alerts and Receipts | ||
|
|
||
| When a governance-enabled agent blocks a tool call, the Agent UI shows a | ||
| non-actionable policy alert instead of an approval prompt. Policy blocks appear | ||
| as inline **Policy Shield** activity cards, critical notifications, and a toast | ||
| with a **View receipt** link when a receipt ID is available. | ||
|
|
||
| Policy alerts are durable session history. If you reload the UI or reconnect | ||
| after a blocked request with no assistant text, the block reason, rule IDs, | ||
| policy version, and receipt ID remain attached to the assistant message. | ||
|
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 22 changed files in this pull request and generated 1 comment.
Files not reviewed (1)
- src/gaia/apps/webui/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
@dislovelhl — triage notes before doing a code-level pass:
Happy to do a code-level pass once the title and issue link are sorted. Copilot's two technical findings (Zustand selector to avoid full-store subscription in |
|
@itomek — prereqs you flagged are sorted (title, Scope at a glance. The backend
Test coverage, honestly.
Open question for you. Follow-up test I'd write next (not this PR): |
Closes #925
Policy BLOCK decisions were already emitted as
policy_alertSSE events, but Agent UI users could not reliably see, understand, or revisit those decisions. This PR turns those backend policy events into visible Policy Shield activity cards, critical notifications/toasts, and persisted receipt details so blocked tool calls stay clear across reloads.Threads
Test plan
cd src/gaia/apps/webui && npm run buildpython -m pytest tests/unit/chat/ui/test_utils_helpers.py::TestMessageToResponse::test_agent_steps_preserves_policy_alert_fields tests/integration/test_chat_ui_integration.py -k policy_alert -qgit diff --checkcd tests/electron && CI=true npm test -- --runInBand