fix(evaluate): resolve project working dir fallback#1492
Conversation
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Metadata
| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|---|
| PR | #1492 |
| HEAD checked | b4e0f2461028e20639e64e53f82147575e2ef9dd |
| Request ID | req_1781961952_131 |
| Review record | f0b79dca-54ca-4743-9e06-1fc2a0f0f60f |
What Improved
- Adds evaluation
working_dirfallback resolution so Stage 1 mechanical checks and Stage 2 artifact collection can use a project root even when the caller omitsworking_dir. - Documents that
working_dircontrols both command execution and source-file visibility.
Issue Requirements
| Requirement | Status |
|---|---|
Add/advertise working_dir as controlling Stage 1 commands and Stage 2 source visibility |
Met |
If working_dir is omitted, fall back through brownfield default, seed project metadata, then MCP server cwd |
Partially met |
| Preserve safe project-boundary behavior for seed-derived paths | Not met |
Prior Findings Status
Prior bot review metadata was present only as history. I did not copy prior findings; the blocker above is based on current source evidence.
Blockers
| # | File:Line | Severity | Finding |
|---|---|---|---|
| 1 | src/ouroboros/mcp/tools/evaluation_handlers.py:107 | BLOCKING | _seed_project_dir() resolves seed-encoded metadata.project_dir / working_directory / context_references[].path with resolve_path_against_base(... ) but does not enable containment. The shared project-path helper explicitly says callers handling untrusted seed paths must opt into containment, and this new fallback now feeds the resolved path into artifact writes/collection and mechanical command execution. A seed can therefore redirect an omitted-working_dir evaluation outside the intended project root, causing Stage 2 to scan/read another tree and Stage 1 to load or run .ouroboros/mechanical.toml from that location. Use the central seed resolver/containment contract or reject escaped seed paths instead of silently trusting them. |
Follow-up Findings
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| 1 | tests/unit/mcp/tools/test_evaluate_multi_ac.py:77 | Medium | Medium | Add a regression test for the conflict case where both a registered brownfield default and seed metadata are present, since that precedence is now part of the documented contract. |
Non-blocking Suggestions
| 1 | tests/unit/mcp/tools/test_evaluate_multi_ac.py:77 | Tests | The new tests cover each fallback source independently, but not seed path containment/rejection or default-vs-seed precedence conflicts. |
Test Coverage Notes
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.pysuccessfully: 16 passed. - Reviewed source paths for resolver, artifact collection, mechanical command cwd, and existing shared project-path containment contract.
Design Notes
The feature belongs in the evaluation boundary, but it duplicates seed project-path resolution instead of reusing the existing containment-aware contract.
Design / Roadmap Gate
The changed boundary controls stateful artifact writes, source-file visibility, and subprocess cwd. Explicit working_dir can remain caller-controlled, but seed-derived fallback paths are not equivalent: seeds may be generated/imported and must not silently escape the intended stable base. Current HEAD lacks that gate.
Directional Notes
I focused on resolver precedence, stale/global defaults, seed trust boundaries, artifact locations, and mechanical command cwd because this PR changes evaluation runtime boundaries.
Test Coverage
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.pysuccessfully: 16 passed. - Reviewed source paths for resolver, artifact collection, mechanical command cwd, and existing shared project-path containment contract.
Merge Recommendation
Do not merge until seed-derived project paths are contained or explicitly rejected, with a regression test proving escaped seed metadata cannot redirect evaluation.
Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: b4e0f24
request_id: req_1781961952_131
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true
Reviewed by ouroboros-agent[bot] via Codex deep analysis
|
Updated this PR in 5b99937 to address the review blocker: seed-derived working-dir fallback now reuses the shared containment-aware seed project-path resolver, and regression coverage now covers brownfield-default precedence and escaped seed metadata fallback.\n\nVerification run locally:\n- |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Metadata
| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|---|
| PR | #1492 |
| HEAD checked | 5b999371ec2d9e83b981610ab73f5008b227af1a |
| Request ID | req_1781964380_133 |
| Review record | e4e71516-2cbc-434a-a0bb-532950ac505f |
What Improved
- Documents that
working_dircontrols both Stage 1 command execution and Stage 2 source visibility. - Adds shared working-directory resolution for evaluation with explicit, brownfield default, seed metadata, and cwd fallback order.
- Adds regression coverage for explicit/default/seed/cwd precedence and escaped seed metadata fallback.
Issue Requirements
| Requirement | Status |
|---|---|
| PR body linked issue requirement | N/A |
working_dir controls both Stage 1 command execution and Stage 2 source-file visibility |
Partially met |
If working_dir is omitted, fallback uses registered brownfield default, then seed project metadata, then MCP server cwd |
Partially met |
| Regression coverage covers brownfield-default precedence and escaped seed metadata fallback | Met |
Prior Findings Status
A prior bot review existed and the contributor says this revision addressed the earlier seed-derived fallback blocker. I did not reuse prior findings; the current blocker is based on the current source and changed fallback boundary.
Blockers
| # | File:Line | Severity | Finding |
|---|---|---|---|
| 1 | src/ouroboros/mcp/tools/evaluation_handlers.py:76 | BLOCKING | _default_brownfield_project_dir() returns the persisted default path after .resolve() without verifying that it still exists and is a directory, and _resolve_evaluate_working_dir() then returns that default before trying seed metadata at lines 110-114. A stale brownfield default is enough to block fallback to a valid seed project root. Downstream, evaluation writes/collects artifacts and runs mechanical discovery from that chosen root (evaluation_handlers.py:594, :601, :615); on a nonexistent root, artifact writes fail, directory scanning returns nothing, detector sees no manifests, and build_mechanical_config() falls back to all commands None, which Stage 1 treats as skipped (src/ouroboros/evaluation/languages.py:257-260). This can silently evaluate the wrong/empty project boundary and approve without the intended mechanical checks. Validate the brownfield default with is_dir() before accepting it, or fall through/error so seed metadata/cwd can be used. |
Follow-up Findings
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| None. |
Non-blocking Suggestions
| None. |
Test Coverage Notes
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q: 18 passed. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/core/test_project_paths.py tests/unit/core/test_seed_project_path_callers.py -q: 41 passed. - Missing critical coverage for stale/nonexistent brownfield default path falling through to seed metadata or cwd.
Design Notes
The centralization around the shared project-path resolver is the right direction, but the new brownfield default boundary needs stale-path validation because it now gates both mechanical verification and semantic source visibility.
Design / Roadmap Gate
Affected boundary: MCP evaluate working-dir resolution now feeds plugin/subagent payloads and in-process Stage 1/Stage 2 evaluation. Explicit paths can be trusted as caller intent, but persisted brownfield defaults are operator state and can become stale after repo moves/deletion. Accepting stale state before seed metadata breaks compatibility with seed-scoped evaluation and can replay/evaluate against an empty boundary.
Directional Notes
Review focused on runtime boundary safety: fallback order, stale persisted state, artifact locations, and whether Stage 1 can silently skip checks. Maintainer memory was advisory only; the blocker is from current HEAD evidence.
Test Coverage
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q: 18 passed. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/core/test_project_paths.py tests/unit/core/test_seed_project_path_callers.py -q: 41 passed. - Missing critical coverage for stale/nonexistent brownfield default path falling through to seed metadata or cwd.
Merge Recommendation
Do not merge until stale or non-directory brownfield defaults are rejected or skipped with regression coverage proving fallback to seed metadata/cwd still preserves Stage 1 and Stage 2 visibility.
Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: 5b99937
request_id: req_1781964380_133
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true
Reviewed by ouroboros-agent[bot] via Codex deep analysis
|
Pushed follow-up commit Verification run locally:
Posted by agentos-roadmap-warden — bot. Reply with |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Metadata
| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|---|
| PR | #1492 |
| HEAD checked | 571f3b91fc140f591f529f4ff96fd58600051e8b |
| Request ID | req_1781967196_136 |
| Review record | 5573b1b5-5d39-4ddb-a322-e635e1bfad21 |
What Improved
ouroboros_evaluatenow resolves a project root before plugin dispatch and in-process evaluation, so Stage 1 commands and Stage 2 artifact collection use the same working directory.- The evaluate skill docs now recommend passing an absolute
working_dirand document fallback order. - Regression coverage now covers explicit working-dir precedence, brownfield default precedence, stale/non-directory defaults, seed metadata fallback, containment escape fallback, and cwd fallback.
Issue Requirements
| Requirement | Status |
|---|---|
| No linked issue / PR body requirement captured | N/A |
Prior Findings Status
Prior bot reviews existed and requested changes. I did not reuse prior findings. Current source shows the latest concerns described in contributor comments are addressed: stale/non-directory brownfield defaults fall through to seed metadata, escaped seed metadata falls back to cwd, and the handler uses the shared containment-aware resolver.
Blockers
No in-scope blocking findings remained after policy filtering.
Follow-up Findings
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| 1 | tests/unit/mcp/tools/test_evaluate_multi_ac.py:148 | Low | High | Minor formatting nit: there is an extra blank line inside TestEvaluateWorkingDirResolution. Ruff accepts it, so this is non-blocking. |
Non-blocking Suggestions
| 1 | tests/unit/mcp/tools/test_evaluate_multi_ac.py:148 | Style | Extra blank line inside the test class; cosmetic only. |
Test Coverage Notes
- Reviewed changed tests in
tests/unit/mcp/tools/test_evaluate_multi_ac.py, especially working-dir resolution coverage at lines 77-224. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q: 20 passed. - Ran resolver-adjacent coverage:
tests/unit/core/test_project_paths.py tests/unit/core/test_seed_project_path_callers.py -q: 41 passed. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run ruff check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py: passed.
Design Notes
The design is scoped and coherent: one handler-level resolver establishes the project root, then both plugin dispatch and in-process evaluation receive that resolved path. It reuses the shared seed path resolver instead of creating a parallel containment implementation.
Design / Roadmap Gate
Affected boundary is ouroboros_evaluate working-directory selection. Current HEAD evidence supports the intended contract: explicit working_dir wins, usable brownfield default is next, seed metadata/context is resolved through resolve_seed_project_path, and cwd remains the final compatibility fallback. The resolved path is passed to subagent context and used for artifact collection plus mechanical config in the in-process path.
Directional Notes
I focused on runtime boundary safety: working-dir precedence, stale brownfield defaults, seed containment, Stage 1 mechanical config, Stage 2 artifact visibility, and plugin dispatch payload shape. Maintainer memory was advisory only; no blocker was raised from memory.
Test Coverage
- Reviewed changed tests in
tests/unit/mcp/tools/test_evaluate_multi_ac.py, especially working-dir resolution coverage at lines 77-224. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q: 20 passed. - Ran resolver-adjacent coverage:
tests/unit/core/test_project_paths.py tests/unit/core/test_seed_project_path_callers.py -q: 41 passed. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run ruff check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py: passed.
Merge Recommendation
Approve. I found no blocking runtime, persistence, plugin-dispatch, or test-coverage issue in the changed boundary, and targeted verification passed.
Review-Metadata:
verdict: APPROVE
head_sha: 571f3b9
request_id: req_1781967196_136
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true
Reviewed by ouroboros-agent[bot] via Codex deep analysis
|
Pushed follow-up commit
Posted by agentos-roadmap-warden — bot. Reply with |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Metadata
| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|---|
| PR | #1492 |
| HEAD checked | 8734e4535f7567da8e2e9e301539a240edde748e |
| Request ID | req_1781967773_137 |
| Review record | 80c5bf03-ec45-4a4e-a932-8dd8a3a2bf3f |
What Improved
- Evaluation now resolves one project root for both Stage 1 commands and Stage 2 source visibility.
- Omitted
working_dirnow falls back through brownfield default, seed metadata, then MCP server cwd. - Tests cover explicit override, brownfield precedence, stale/non-directory brownfield fallback, seed metadata, escaped seed metadata, and cwd fallback.
Issue Requirements
| Requirement | Status |
|---|---|
Ensure omitted evaluate working_dir can fall back to the registered brownfield default |
Met |
| Fall back from unusable brownfield defaults to seed metadata | Met |
| Use containment-aware seed project-path resolution for seed-derived fallback | Met |
Document that working_dir controls Stage 1 and Stage 2 visibility |
Met |
| Add regression coverage for precedence and fallback behavior | Met |
Prior Findings Status
Prior bot reviews existed, including earlier changes-requested states and a later approval. I rechecked the current source independently; the stale/non-directory brownfield fallback and seed containment cases are now covered by current tests, so no prior blocker is maintained.
Blockers
No in-scope blocking findings remained after policy filtering.
Follow-up Findings
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| 1 | skills/evaluate/SKILL.md:76 | Low | Medium | The skill example still documents only singular acceptance_criterion; documenting acceptance_criteria would make the newer checklist path easier to discover. |
Non-blocking Suggestions
| 1 | skills/evaluate/SKILL.md:76 | Documentation | Consider adding the plural acceptance_criteria argument to the usage block. |
Test Coverage Notes
- Reviewed changed source, diff, PR comments, prior review metadata, and affected path-resolution contracts.
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q: 20 passed. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run ruff check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py: passed. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run ruff format --check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py: passed.
Design Notes
The change keeps the fallback logic centralized in the MCP handler and reuses the shared seed path resolver instead of adding a parallel path parser. The runtime split between plugin dispatch and in-process evaluation now receives the same resolved root.
Design / Roadmap Gate
Affected boundary: ouroboros_evaluate working-directory resolution across plugin/subagent dispatch, mechanical verification, artifact collection, and seed metadata recovery. Current HEAD preserves explicit operator override, validates brownfield defaults as existing directories, applies contained seed metadata, and keeps cwd as the final compatibility fallback. The added tests exercise the critical precedence and stale-path cases.
Directional Notes
Review focus followed the maintainer posture around runtime boundaries, fallback ordering, stale project state, and duplicate path-resolution logic. Memory was advisory only; no blocker was raised from memory.
Test Coverage
- Reviewed changed source, diff, PR comments, prior review metadata, and affected path-resolution contracts.
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q: 20 passed. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run ruff check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py: passed. - Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run ruff format --check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py: passed.
Merge Recommendation
Approve. I found no current HEAD blocker, and the targeted runtime and formatting checks pass.
Review-Metadata:
verdict: APPROVE
head_sha: 8734e45
request_id: req_1781967773_137
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true
Reviewed by ouroboros-agent[bot] via Codex deep analysis
|
Pushed follow-up commit Additional local verification:
Posted by agentos-roadmap-warden — bot. Reply with |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Metadata
| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|---|
| PR | #1492 |
| HEAD checked | 621f9224d73294e3b2a9feae932643dfd344ef64 |
| Request ID | req_1781968981_140 |
| Review record | 50bf5252-8d4e-4d97-99e8-339b2b110e1a |
What Improved
ouroboros_evaluatenow resolves omittedworking_dirthrough explicit arg, brownfield default, seed metadata, then MCP cwd for the direct/blocking path.- The evaluate skill documents that
working_diraffects both Stage 1 commands and Stage 2 source visibility. - Regression tests cover direct evaluate fallback precedence and capability metadata expectations.
Issue Requirements
| Requirement | Status |
|---|---|
Document working_dir as controlling Stage 1 command execution and Stage 2 source visibility |
Met |
Omitted direct ouroboros_evaluate working_dir falls back to brownfield default, seed project metadata, then MCP cwd |
Met |
| Ignore stale/non-directory brownfield defaults before seed fallback | Met |
Keep capability metadata aware of evaluate working_dir context |
Met |
Apply the same omitted-working_dir fallback across evaluate MCP entrypoints, including start/plugin dispatch |
Not met |
Prior Findings Status
Prior bot review text was not reused. Human comments indicate earlier direct-evaluate fallback blockers were addressed; current source and tests support that for EvaluateHandler. This review maintains a blocker only for the sibling StartEvaluateHandler plugin boundary still bypassing the new resolver.
Blockers
| # | File:Line | Severity | Finding |
|---|---|---|---|
| 1 | src/ouroboros/mcp/tools/evaluation_handlers.py:1871 | BLOCKING | StartEvaluateHandler still bypasses the new fallback resolver in plugin mode. Its definition reuses EvaluateHandler().definition.parameters, so ouroboros_start_evaluate now exposes the same working_dir contract, but the plugin-dispatch branch returns before the inner EvaluateHandler.handle() runs and forwards only arguments.get("working_dir"). When callers omit working_dir, plugin-mode start-evaluate payloads get None instead of the brownfield default, seed project metadata, or cwd that direct ouroboros_evaluate now computes at lines 519 and 528. This leaves Stage 2 source visibility wrong for the documented long-running/plugin evaluate boundary. |
Follow-up Findings
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| 1 | src/ouroboros/mcp/tools/evaluation_handlers.py:113 | Medium | Medium | Consider adding a regression for explicit working_dir that points to a missing or non-directory path, so the intended behavior is locked down instead of inherited from downstream best-effort failures. |
Non-blocking Suggestions
| 1 | src/ouroboros/mcp/tools/evaluation_handlers.py:118 | Cleanup | _default_brownfield_project_dir() already filters unusable defaults, so the repeated is_dir() branch in _resolve_evaluate_working_dir() is defensive but currently redundant. |
Test Coverage Notes
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q: 20 passed. - Ran targeted capability metadata tests: 31 passed.
- Ran
ruff checkandruff format --checkon changed source/tests: passed. - Missing coverage:
ouroboros_start_evaluateplugin-mode fallback path.
Design Notes
The resolver is placed correctly for direct evaluation before plugin and in-process dispatch, but the start-evaluate plugin shortcut now diverges from the shared contract.
Design / Roadmap Gate
Affected boundary: evaluate working-directory resolution feeds mechanical config, artifact collection, semantic source visibility, and plugin payload context. Direct and background non-plugin paths use the new resolver, but ouroboros_start_evaluate plugin mode returns before that shared resolution and preserves a stale raw-argument path.
Directional Notes
Review focus was on runtime boundaries, fallback precedence, plugin dispatch, and capability metadata. Maintainer memory was advisory only; the blocker is based on current source evidence.
Test Coverage
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q: 20 passed. - Ran targeted capability metadata tests: 31 passed.
- Ran
ruff checkandruff format --checkon changed source/tests: passed. - Missing coverage:
ouroboros_start_evaluateplugin-mode fallback path.
Merge Recommendation
REQUEST_CHANGES until StartEvaluateHandler plugin dispatch resolves working_dir with the same fallback logic, with a regression test covering omitted working_dir plus brownfield or seed-derived project root.
Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: 621f922
request_id: req_1781968981_140
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true
Reviewed by ouroboros-agent[bot] via Codex deep analysis
|
Pushed follow-up commit Additional local verification:
Posted by agentos-roadmap-warden — bot. Reply with |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Metadata
| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|---|
| PR | #1492 |
| HEAD checked | d55f5c02f2e5a9ce62abf732de4bb2353d54dd1d |
| Request ID | req_1781969887_142 |
| Review record | 92b4cead-e816-4199-b861-f96f8fa9c29b |
What Improved
ouroboros_evaluatenow resolves omittedworking_dirfrom brownfield default, seed metadata, then MCP cwd.ouroboros_start_evaluateplugin dispatch now uses the same fallback before building the subagent payload.- Evaluate skill docs now state that
working_dirgates both Stage 1 commands and Stage 2 source visibility.
Issue Requirements
| Requirement | Status |
|---|---|
Document that evaluate working_dir controls Stage 1 command execution and Stage 2 source visibility |
Met |
When evaluate working_dir is omitted, fall back to registered brownfield default before cwd |
Met |
| If brownfield default is stale or not a directory, continue to seed metadata/cwd fallback | Met |
Apply omitted-working_dir fallback to ouroboros_start_evaluate plugin dispatch payloads |
Met |
Keep capability metadata aware of evaluate working_dir context |
Met |
| Linked issue requirement captured | N/A |
Prior Findings Status
Prior review concerns are not re-raised. Current HEAD shows the seed-derived fallback now uses the shared containment-aware resolver, stale/non-directory brownfield defaults fall through, and start_evaluate plugin dispatch resolves working_dir before payload construction.
Blockers
No in-scope blocking findings remained after policy filtering.
Follow-up Findings
| # | File:Line | Priority | Confidence | Suggestion |
|---|---|---|---|---|
| None. |
Non-blocking Suggestions
| - | - | - | None. |
Test Coverage Notes
- Reviewed changed tests for evaluate working-dir fallback, stale/non-directory brownfield defaults, seed metadata containment fallback, start_evaluate plugin payload, and capability metadata.
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py tests/unit/mcp/tools/test_start_evaluate.py -q— 36 passed. - Ran targeted capability metadata pytest — 31 passed.
- Ran Ruff check and Ruff format check on changed source/tests — passed.
Design Notes
The change centralizes evaluate project-root fallback in a small helper and reuses existing project-path containment logic for seed-derived paths. The MCP handlers remain thin enough, with behavior guarded by focused unit tests.
Design / Roadmap Gate
Affected boundary is evaluate project-root selection. Explicit working_dir remains highest precedence; omitted values resolve through brownfield default, seed metadata, then cwd. Brownfield defaults are required to exist as directories. Seed metadata resolution is containment-aware through resolve_seed_project_path. Plugin and in-process evaluate paths now receive the resolved directory string/path before dispatch or pipeline execution.
Directional Notes
Review focused on runtime boundary safety: stale operator defaults, seed metadata containment, plugin dispatch payload parity, and Stage 1/Stage 2 project-root consistency. Maintainer memory was advisory only; no blocker was based on memory alone.
Test Coverage
- Reviewed changed tests for evaluate working-dir fallback, stale/non-directory brownfield defaults, seed metadata containment fallback, start_evaluate plugin payload, and capability metadata.
- Ran
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py tests/unit/mcp/tools/test_start_evaluate.py -q— 36 passed. - Ran targeted capability metadata pytest — 31 passed.
- Ran Ruff check and Ruff format check on changed source/tests — passed.
Merge Recommendation
Approve. Current HEAD covers the previously risky evaluate working-dir fallback and plugin dispatch boundary with passing targeted tests and lint/format checks.
Review-Metadata:
verdict: APPROVE
head_sha: d55f5c0
request_id: req_1781969887_142
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true
Reviewed by ouroboros-agent[bot] via Codex deep analysis
Summary
working_dirin the evaluate skill and clarify that it controls Stage 2 source-file visibility as well as Stage 1 commands.working_dir, registered brownfield default, seed metadata/context, then cwd.Test plan
uv run ruff check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.pyuv run pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py tests/unit/mcp/tools/test_definitions.py -q— 148 passeduv run pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py tests/unit/mcp/tools/test_definitions.py tests/unit/mcp/tools/test_start_evaluate.py tests/unit/mcp/tools/test_checklist_verify.py tests/unit/mcp/tools/test_handler_subagent_wiring.py -q— 210 passedCloses #1490
Refs #1491