fix(plugin): wire production tool-call hooks#1489
Conversation
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Metadata
| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|---|
| PR | #1489 |
| HEAD checked | 48c1b8b2e6ad17e5ae943a68c22720f572f58de1 |
| Request ID | req_1781954548_126 |
| Review record | 69879873-5845-43dc-b504-88aafb709989 |
What Improved
- Wires v0.4
before_tool_call/after_tool_calldispatcher helpers into productioninvoke_plugin. - Adds integration coverage for blocking, fail-open, observe, and unauthorized intercept hook paths.
- Updates plugin RFC text to describe production mediation as wired.
Issue Requirements
| Requirement | Status |
|---|---|
| No linked issue or PR-body requirement captured | N/A |
Prior Findings Status
Prior bot review metadata artifact was empty; no prior concerns were maintained or repeated.
Blockers
| # | File:Line | Severity | Finding |
|---|---|---|---|
| 1 | src/ouroboros/plugin/firewall.py:1289 | BLOCKING | after_tool_call is given only stdout_hash as output_digest, but the frozen contract defines output_digest as the digest of combined stdout + stderr. The same stdout-only value is passed on the failed path at line 1329, so any command that writes meaningful diagnostics or data to stderr produces an observation payload that cannot reconcile the actual tool output. The new integration test also locks this wrong behavior by asserting observed["output_digest"] == result.stdout_sha256 at tests/integration/plugin/test_e2e.py:771. Compute and pass the contract digest over the combined streams, and cover a stdout+stderr case. |
| 2 | src/ouroboros/plugin/firewall.py:1133 | BLOCKING | A blocking before_tool_call returns InvocationResult(status="blocked") after plugin.invoked and plugin.permission_used have already been emitted, but it does not emit a terminal plugin.failed for the parent invocation. The firewall contract at the top of this file says opened command invocations close with plugin.completed or plugin.failed, and the tool-call RFC says a blocked intercept means the parent plugin sees a “tool blocked” failure mode. As written, ledger consumers can see an opened plugin run with no terminal parent event. Emit a plugin.failed with blocked/tool-blocked provenance before returning, or move this gate before plugin.invoked if it is meant to be pre-invocation. |
Follow-up Findings
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| 1 | src/ouroboros/plugin/hooks.py:82 | Low | High | Update stale comments that still say production invoke_plugin does not call the tool-call helpers. |
Non-blocking Suggestions
| 1 | src/ouroboros/plugin/hooks.py:259 | Docs | Comment still describes plugin.tool.* events as not emitted by production invoke_plugin, which now conflicts with this PR’s wiring. |
Test Coverage Notes
- Reviewed PR body, changed-files list, unified diff, comments, prior-review metadata, changed source, RFC contract, and adjacent plugin tests.
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/plugin/test_firewall_tool_call_dispatch.py tests/integration/plugin/test_e2e.py -q: 31 passed. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/plugin/test_firewall.py tests/unit/plugin/test_manifest_schema_0_4.py tests/unit/plugin/test_manifest_hook_validation.py tests/unit/plugin/test_lifecycle_observability.py -q: 122 passed.
Design Notes
The direction is plausible: production invocation now uses the existing dispatcher helpers instead of inventing a separate path. The current wiring needs to preserve the parent invocation audit contract and exact payload semantics before it is mergeable.
Design / Roadmap Gate
Affected boundary is the plugin runtime firewall: trust, permission, hook dispatch, subprocess execution, and audit ledger semantics. The PR changes a formerly inert v0.4 hook surface into production behavior, so parent invocation terminal events and hook payload digests must be exact and test-covered. Current code fails those contracts for blocked intercepts and stderr-bearing outputs.
Directional Notes
Maintainer memory around plugin hook permission/audit boundaries shaped the review focus, but both blockers above are based on current source and RFC evidence.
Test Coverage
- Reviewed PR body, changed-files list, unified diff, comments, prior-review metadata, changed source, RFC contract, and adjacent plugin tests.
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/plugin/test_firewall_tool_call_dispatch.py tests/integration/plugin/test_e2e.py -q: 31 passed. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/plugin/test_firewall.py tests/unit/plugin/test_manifest_schema_0_4.py tests/unit/plugin/test_manifest_hook_validation.py tests/unit/plugin/test_lifecycle_observability.py -q: 122 passed.
Merge Recommendation
REQUEST_CHANGES. The tests pass, but production tool-call mediation currently emits an incorrect after-call digest and can leave an opened parent plugin invocation without a terminal audit event.
Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: 48c1b8b
request_id: req_1781954548_126
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true
Reviewed by ouroboros-agent[bot] via Codex deep analysis
|
Pushed follow-up commit
Verification:
Posted by agentos-roadmap-warden — bot. Reply with |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Metadata
| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|---|
| PR | #1489 |
| HEAD checked | 6fe89ee073b31c2ac9853a07c3d0bafaeb7c25db |
| Request ID | req_1781957299_127 |
| Review record | 3f6691e6-1996-434a-acf9-7edcdb092d67 |
What Improved
- Wires v0.4
before_tool_call/after_tool_callhelper dispatch into the productioninvoke_plugincommand subprocess boundary. - Adds e2e coverage for intercept blocking, fail-open intercept behavior, after-call observation, combined stdout/stderr hashing for returned subprocess failures, and unauthorized intercept suppression.
Issue Requirements
| Requirement | Status |
|---|---|
Wire production invoke_plugin mediation through tool-call helpers |
Partially met |
Hash combined stdout + stderr for after_tool_call.output_digest |
Partially met |
Blocked before_tool_call closes parent invocation with plugin.failed and tool_call_blocked provenance |
Met |
| Add regression coverage for blocked terminal events and stderr-bearing failed output | Partially met |
| Update stale production-wiring comments/docs | Partially met |
Prior Findings Status
Prior bot review metadata shows an earlier CHANGES_REQUESTED review existed. I did not copy prior findings; the blockers above are based on current source and diff evidence. The contributor’s stated fixes for blocked before_tool_call terminal events are present, but digest/payload and fallback-path wiring issues remain in current HEAD.
Blockers
| # | File:Line | Severity | Finding |
|---|---|---|---|
| 1 | src/ouroboros/plugin/firewall.py:1110 | BLOCKING | The production tool-call payload violates the frozen RFC digest/duration contract. args_digest is built as a bare hexdigest() here, output_digest is also a bare hexdigest() at line 1281, and both after-call dispatches pass duration_ms=0 at lines 1310 and 1350. docs/rfc/plugin-tool-call-hook-contract.md specifies sha256:<hex> for both digest fields and wall-clock duration for duration_ms. Hook authors consuming the documented v0.4 payload will see malformed digest fields and unusable timing data even though the docs now say production mediation is wired. |
Follow-up Findings
src/ouroboros/plugin/firewall.py:1178[warning]after_tool_callis skipped for controlled subprocess failure paths after a successfulbefore_tool_calldispatch. The FileNotFound, TimeoutExpired, and OSError branches emitplugin.failedand return via lines 1195, 1235, and 1270 without callingdispatch_after_tool_call, so v0.4 after-call observers do not see failed tool outcomes for launch failures or timeouts. The contract definesafter_tool_callas observing tool results includingfailed, and this PR claims production mediation is wired through the helpers; the fallback failure paths remain unwired and untested.
| # | File:Line | Priority | Confidence | Suggestion |
|---|-----------|----------|------------|------------|
| 1 | src/ouroboros/plugin/schemas/0.4/_source.json:3 | Low | High | Update the remaining schema-source prose that still says production wiring is a follow-up or only emitted until production mediation is wired. |
Non-blocking Suggestions
| 1 | src/ouroboros/plugin/schemas/0.4/_source.json:17 | Documentation | This generated/source note still describes the old inert-production state and should be brought in line with the new invoke_plugin wiring. |
Test Coverage Notes
- Reviewed changed source, diff, PR comments, inline review comments, and prior-review metadata.
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/integration/plugin/test_e2e.py tests/unit/plugin/test_firewall_tool_call_dispatch.py -q— 32 passed. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/plugin/test_firewall.py tests/unit/plugin/test_firewall_tool_call_dispatch.py -q— 63 passed. - Coverage gap: the new e2e tests assert the bare digest behavior, and no production test covers after-call observation for timeout or launch-failure branches.
Design Notes
The approach is directionally right: keep helper semantics centralized and call them from the firewall boundary. The remaining problems are contract fidelity at the payload boundary and incomplete failed-path dispatch.
Design / Roadmap Gate
Affected boundary: v0.4 plugin hook runtime contract, audit events, subprocess execution, and trust-scoped tool observation. Compatibility is not safe yet because documented sha256:<hex> payloads are emitted as bare hex, duration is not measured, and after-call observers miss timeout/launch failure outcomes. Non-goals like artifact/state hooks remain deferred.
Directional Notes
Review focus was shaped by plugin/runtime boundary risk: hook payloads are public contracts, and fallback subprocess paths must preserve audit and observability semantics. Maintainer memory was advisory only; blockers above use current source evidence.
Test Coverage
- Reviewed changed source, diff, PR comments, inline review comments, and prior-review metadata.
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/integration/plugin/test_e2e.py tests/unit/plugin/test_firewall_tool_call_dispatch.py -q— 32 passed. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/plugin/test_firewall.py tests/unit/plugin/test_firewall_tool_call_dispatch.py -q— 63 passed. - Coverage gap: the new e2e tests assert the bare digest behavior, and no production test covers after-call observation for timeout or launch-failure branches.
Merge Recommendation
REQUEST_CHANGES until production hook payloads match the frozen RFC and failed subprocess fallback paths either dispatch after_tool_call with a documented failed payload or explicitly update the contract and tests to exclude those outcomes.
Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: 6fe89ee
request_id: req_1781957299_127
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true
Reviewed by ouroboros-agent[bot] via Codex deep analysis
|
Pushed follow-up commit Changes:
Verification:
Posted by agentos-roadmap-warden — bot. Reply with |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Metadata
| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|---|
| PR | #1489 |
| HEAD checked | 0727fa8ce9299647610495925e9390286390f2f3 |
| Request ID | req_1781959407_128 |
| Review record | 81835154-89eb-4be2-8517-463b1b2ef244 |
What Improved
- Wires v0.4
before_tool_call/after_tool_callhooks into productioninvoke_plugincommand execution. - Adds regression coverage for intercept blocking, fail-open intercepts, after-call success/failure payloads, unauthorized intercepts, and launch-failure observation.
- Updates public docs/comments to reflect production mediation wiring.
Issue Requirements
| Requirement | Status |
|---|---|
| No linked issue or PR-body requirement captured | N/A |
Prior Findings Status
Prior bot reviews existed and contributor comments indicate several prior concerns were addressed. I did not copy prior findings; the blocker above is based on the current source snapshot and current contract text.
Blockers
No in-scope blocking findings remained after policy filtering.
Follow-up Findings
src/ouroboros/plugin/firewall.py:1137[warning] Abefore_tool_callblock skips the matchingafter_tool_callobservation entirely. The public contract definesafter_tool_call.statusas includingblockedand says theinvocation_idpairs with the matchingbefore_tool_callpayload, but this branch emitsplugin.failedand returns before callingdispatch_after_tool_call. A v0.4 plugin that declares both hooks therefore cannot observe blocked tool-call outcomes, leaving the production hook/audit contract incomplete on a security-relevant path. Add anafter_tool_calldispatch withstatus="blocked",exit_code=None, and a deterministic digest for the blocked outcome, plus a regression test declaring both hooks.
| # | File:Line | Priority | Confidence | Suggestion |
|---|-----------|----------|------------|------------|
| 1 | src/ouroboros/plugin/schemas/0.4/_source.json:17 | Low | High | The audit-event schema note still says tool-call events are emitted only by explicit standalone helper calls until production mediation is wired. Update this metadata to match the new production behavior. |
Non-blocking Suggestions
| 1 | tests/integration/plugin/test_e2e.py:281 | Documentation | The Path 1 comment still claims the test computes and plumbs expected_artifact_digest, but the PR removed that argument from this test. Either restore the digest plumbing for this production-path proof or update the comment so future reviewers do not overestimate what this test covers. |
Test Coverage Notes
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/plugin/test_firewall_tool_call_dispatch.py tests/unit/plugin/test_firewall.py tests/integration/plugin/test_e2e.py -q: 81 passed. - Reviewed the new production hook tests; coverage is missing for a blocked
before_tool_callwith anafter_tool_callhook declared.
Design Notes
The overall design fits the existing firewall helper boundary, but the production wiring must treat blocked tool-call outcomes as first-class observable results if after_tool_call advertises blocked status.
Design / Roadmap Gate
Affected boundary: v0.4 plugin hook runtime/audit contract. Success, non-zero failure, timeout, and launch failure now have after-call coverage, but intercept-block replay/observability is incomplete. Compatibility for v0.3 manifests appears preserved.
Directional Notes
Maintainer memory made hook payloads, terminal audit events, and fallback paths the main review focus. The blocker is grounded in current firewall.py behavior and the checked-in RFC contract, not memory alone.
Test Coverage
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/plugin/test_firewall_tool_call_dispatch.py tests/unit/plugin/test_firewall.py tests/integration/plugin/test_e2e.py -q: 81 passed. - Reviewed the new production hook tests; coverage is missing for a blocked
before_tool_callwith anafter_tool_callhook declared.
Merge Recommendation
REQUEST_CHANGES until blocked tool-call outcomes dispatch the documented after_tool_call observation or the public contract is narrowed with tests proving the intended non-observation semantics.
Policy Notes
- No in-scope blocking findings remained after policy filtering; downgraded verdict accordingly.
Review-Metadata:
verdict: APPROVE
head_sha: 0727fa8
request_id: req_1781959407_128
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true
Reviewed by ouroboros-agent[bot] via Codex deep analysis
Summary
before_tool_calldispatch into the productioninvoke_plugincommand boundary before the mediated subprocess launches.after_tool_callobservation after production command completion/failure without changing the terminal result.Test plan
uv run ruff format src/ouroboros/plugin/firewall.py tests/integration/plugin/test_e2e.pyuv run pytest tests/integration/plugin/test_e2e.py -xuv run pytest tests/unit/plugin/test_firewall_tool_call_dispatch.py tests/unit/plugin/test_firewall.py -xuv run pytest tests/integration/plugin/test_e2e.py tests/unit/plugin/test_firewall_tool_call_dispatch.py tests/unit/plugin/test_firewall.py -quv run ruff check src/ouroboros/plugin/firewall.py src/ouroboros/plugin/hooks.py src/ouroboros/plugin/manifest.py tests/integration/plugin/test_e2e.py tests/unit/plugin/test_firewall_tool_call_dispatch.pyCloses #1461