fix(evaluate): resolve project working dir fallback by Q00 · Pull Request #1492 · Q00/ouroboros

Q00 · 2026-06-20T13:25:48Z

Summary

Document working_dir in the evaluate skill and clarify that it controls Stage 2 source-file visibility as well as Stage 1 commands.
Resolve evaluate working directories from explicit working_dir, registered brownfield default, seed metadata/context, then cwd.
Pass the resolved project root through plugin/subagent dispatch and in-process evaluation.
Add regression coverage for explicit, brownfield-default, seed metadata, and cwd fallback paths.

Test plan

uv run ruff check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py
uv run pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py tests/unit/mcp/tools/test_definitions.py -q — 148 passed
uv run pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py tests/unit/mcp/tools/test_definitions.py tests/unit/mcp/tools/test_start_evaluate.py tests/unit/mcp/tools/test_checklist_verify.py tests/unit/mcp/tools/test_handler_subagent_wiring.py -q — 210 passed

Closes #1490
Refs #1491

ouroboros-agent

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1492 |
| HEAD checked | b4e0f2461028e20639e64e53f82147575e2ef9dd |
| Request ID | req_1781961952_131 |
| Review record | f0b79dca-54ca-4743-9e06-1fc2a0f0f60f |

What Improved

Adds evaluation working_dir fallback resolution so Stage 1 mechanical checks and Stage 2 artifact collection can use a project root even when the caller omits working_dir.
Documents that working_dir controls both command execution and source-file visibility.

Issue Requirements

Requirement	Status
Add/advertise `working_dir` as controlling Stage 1 commands and Stage 2 source visibility	Met
If `working_dir` is omitted, fall back through brownfield default, seed project metadata, then MCP server cwd	Partially met
Preserve safe project-boundary behavior for seed-derived paths	Not met

Prior Findings Status

Prior bot review metadata was present only as history. I did not copy prior findings; the blocker above is based on current source evidence.

Blockers

#	File:Line	Severity	Finding
1	src/ouroboros/mcp/tools/evaluation_handlers.py:107	BLOCKING	`_seed_project_dir()` resolves seed-encoded `metadata.project_dir` / `working_directory` / `context_references[].path` with `resolve_path_against_base(... )` but does not enable containment. The shared project-path helper explicitly says callers handling untrusted seed paths must opt into containment, and this new fallback now feeds the resolved path into artifact writes/collection and mechanical command execution. A seed can therefore redirect an omitted-`working_dir` evaluation outside the intended project root, causing Stage 2 to scan/read another tree and Stage 1 to load or run `.ouroboros/mechanical.toml` from that location. Use the central seed resolver/containment contract or reject escaped seed paths instead of silently trusting them.

Follow-up Findings

#	File:Line	Priority	Confidence	Suggestion
1	tests/unit/mcp/tools/test_evaluate_multi_ac.py:77	Medium	Medium	Add a regression test for the conflict case where both a registered brownfield default and seed metadata are present, since that precedence is now part of the documented contract.

Non-blocking Suggestions

| 1 | tests/unit/mcp/tools/test_evaluate_multi_ac.py:77 | Tests | The new tests cover each fallback source independently, but not seed path containment/rejection or default-vs-seed precedence conflicts. |

Test Coverage Notes

Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py successfully: 16 passed.
Reviewed source paths for resolver, artifact collection, mechanical command cwd, and existing shared project-path containment contract.

Design Notes

The feature belongs in the evaluation boundary, but it duplicates seed project-path resolution instead of reusing the existing containment-aware contract.

Design / Roadmap Gate

The changed boundary controls stateful artifact writes, source-file visibility, and subprocess cwd. Explicit working_dir can remain caller-controlled, but seed-derived fallback paths are not equivalent: seeds may be generated/imported and must not silently escape the intended stable base. Current HEAD lacks that gate.

Directional Notes

I focused on resolver precedence, stale/global defaults, seed trust boundaries, artifact locations, and mechanical command cwd because this PR changes evaluation runtime boundaries.

Test Coverage

Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py successfully: 16 passed.
Reviewed source paths for resolver, artifact collection, mechanical command cwd, and existing shared project-path containment contract.

Merge Recommendation

Do not merge until seed-derived project paths are contained or explicitly rejected, with a regression test proving escaped seed metadata cannot redirect evaluation.

Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: b4e0f24
request_id: req_1781961952_131
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true

Reviewed by ouroboros-agent[bot] via Codex deep analysis

Q00 · 2026-06-20T14:01:37Z

Updated this PR in 5b99937 to address the review blocker: seed-derived working-dir fallback now reuses the shared containment-aware seed project-path resolver, and regression coverage now covers brownfield-default precedence and escaped seed metadata fallback.\n\nVerification run locally:\n- SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q (18 passed)\n- SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run ruff check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py (passed)\n\n_Posted by agentos-roadmap-warden — bot. Reply with /warden ignore to suppress further comments on this thread._

ouroboros-agent

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1492 |
| HEAD checked | 5b999371ec2d9e83b981610ab73f5008b227af1a |
| Request ID | req_1781964380_133 |
| Review record | e4e71516-2cbc-434a-a0bb-532950ac505f |

What Improved

Documents that working_dir controls both Stage 1 command execution and Stage 2 source visibility.
Adds shared working-directory resolution for evaluation with explicit, brownfield default, seed metadata, and cwd fallback order.
Adds regression coverage for explicit/default/seed/cwd precedence and escaped seed metadata fallback.

Issue Requirements

Requirement	Status
PR body linked issue requirement	N/A
`working_dir` controls both Stage 1 command execution and Stage 2 source-file visibility	Partially met
If `working_dir` is omitted, fallback uses registered brownfield default, then seed project metadata, then MCP server cwd	Partially met
Regression coverage covers brownfield-default precedence and escaped seed metadata fallback	Met

Prior Findings Status

A prior bot review existed and the contributor says this revision addressed the earlier seed-derived fallback blocker. I did not reuse prior findings; the current blocker is based on the current source and changed fallback boundary.

Blockers

#	File:Line	Severity	Finding
1	src/ouroboros/mcp/tools/evaluation_handlers.py:76	BLOCKING	`_default_brownfield_project_dir()` returns the persisted default path after `.resolve()` without verifying that it still exists and is a directory, and `_resolve_evaluate_working_dir()` then returns that default before trying seed metadata at lines 110-114. A stale brownfield default is enough to block fallback to a valid seed project root. Downstream, evaluation writes/collects artifacts and runs mechanical discovery from that chosen root (`evaluation_handlers.py:594`, `:601`, `:615`); on a nonexistent root, artifact writes fail, directory scanning returns nothing, detector sees no manifests, and `build_mechanical_config()` falls back to all commands `None`, which Stage 1 treats as skipped (`src/ouroboros/evaluation/languages.py:257-260`). This can silently evaluate the wrong/empty project boundary and approve without the intended mechanical checks. Validate the brownfield default with `is_dir()` before accepting it, or fall through/error so seed metadata/cwd can be used.

Follow-up Findings

#	File:Line	Priority	Confidence	Suggestion
None.

Non-blocking Suggestions

| None. |

Test Coverage Notes

Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q: 18 passed.
Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/core/test_project_paths.py tests/unit/core/test_seed_project_path_callers.py -q: 41 passed.
Missing critical coverage for stale/nonexistent brownfield default path falling through to seed metadata or cwd.

Design Notes

The centralization around the shared project-path resolver is the right direction, but the new brownfield default boundary needs stale-path validation because it now gates both mechanical verification and semantic source visibility.

Design / Roadmap Gate

Affected boundary: MCP evaluate working-dir resolution now feeds plugin/subagent payloads and in-process Stage 1/Stage 2 evaluation. Explicit paths can be trusted as caller intent, but persisted brownfield defaults are operator state and can become stale after repo moves/deletion. Accepting stale state before seed metadata breaks compatibility with seed-scoped evaluation and can replay/evaluate against an empty boundary.

Directional Notes

Review focused on runtime boundary safety: fallback order, stale persisted state, artifact locations, and whether Stage 1 can silently skip checks. Maintainer memory was advisory only; the blocker is from current HEAD evidence.

Test Coverage

Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q: 18 passed.
Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/core/test_project_paths.py tests/unit/core/test_seed_project_path_callers.py -q: 41 passed.
Missing critical coverage for stale/nonexistent brownfield default path falling through to seed metadata or cwd.

Merge Recommendation

Do not merge until stale or non-directory brownfield defaults are rejected or skipped with regression coverage proving fallback to seed metadata/cwd still preserves Stage 1 and Stage 2 visibility.

Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: 5b99937
request_id: req_1781964380_133
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true

Reviewed by ouroboros-agent[bot] via Codex deep analysis

Q00 · 2026-06-20T14:48:21Z

Pushed follow-up commit 571f3b91 to address the latest review blocker: evaluate fallback now ignores stale or non-directory brownfield defaults and continues to seed metadata before falling back to the MCP server cwd.

Verification run locally:

SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q (20 passed)
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run ruff check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py (passed)

Posted by agentos-roadmap-warden — bot. Reply with /warden ignore to suppress further comments on this thread.

ouroboros-agent

Review — ouroboros-agent[bot]

Verdict: APPROVE

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1492 |
| HEAD checked | 571f3b91fc140f591f529f4ff96fd58600051e8b |
| Request ID | req_1781967196_136 |
| Review record | 5573b1b5-5d39-4ddb-a322-e635e1bfad21 |

What Improved

ouroboros_evaluate now resolves a project root before plugin dispatch and in-process evaluation, so Stage 1 commands and Stage 2 artifact collection use the same working directory.
The evaluate skill docs now recommend passing an absolute working_dir and document fallback order.
Regression coverage now covers explicit working-dir precedence, brownfield default precedence, stale/non-directory defaults, seed metadata fallback, containment escape fallback, and cwd fallback.

Issue Requirements

Requirement	Status
No linked issue / PR body requirement captured	N/A

Prior Findings Status

Prior bot reviews existed and requested changes. I did not reuse prior findings. Current source shows the latest concerns described in contributor comments are addressed: stale/non-directory brownfield defaults fall through to seed metadata, escaped seed metadata falls back to cwd, and the handler uses the shared containment-aware resolver.

Blockers

No in-scope blocking findings remained after policy filtering.

Follow-up Findings

#	File:Line	Priority	Confidence	Suggestion
1	tests/unit/mcp/tools/test_evaluate_multi_ac.py:148	Low	High	Minor formatting nit: there is an extra blank line inside `TestEvaluateWorkingDirResolution`. Ruff accepts it, so this is non-blocking.

Non-blocking Suggestions

Test Coverage Notes

Reviewed changed tests in tests/unit/mcp/tools/test_evaluate_multi_ac.py, especially working-dir resolution coverage at lines 77-224.
Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q: 20 passed.
Ran resolver-adjacent coverage: tests/unit/core/test_project_paths.py tests/unit/core/test_seed_project_path_callers.py -q: 41 passed.
Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run ruff check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py: passed.

Design Notes

The design is scoped and coherent: one handler-level resolver establishes the project root, then both plugin dispatch and in-process evaluation receive that resolved path. It reuses the shared seed path resolver instead of creating a parallel containment implementation.

Design / Roadmap Gate

Affected boundary is ouroboros_evaluate working-directory selection. Current HEAD evidence supports the intended contract: explicit working_dir wins, usable brownfield default is next, seed metadata/context is resolved through resolve_seed_project_path, and cwd remains the final compatibility fallback. The resolved path is passed to subagent context and used for artifact collection plus mechanical config in the in-process path.

Directional Notes

I focused on runtime boundary safety: working-dir precedence, stale brownfield defaults, seed containment, Stage 1 mechanical config, Stage 2 artifact visibility, and plugin dispatch payload shape. Maintainer memory was advisory only; no blocker was raised from memory.

Test Coverage

Reviewed changed tests in tests/unit/mcp/tools/test_evaluate_multi_ac.py, especially working-dir resolution coverage at lines 77-224.
Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q: 20 passed.
Ran resolver-adjacent coverage: tests/unit/core/test_project_paths.py tests/unit/core/test_seed_project_path_callers.py -q: 41 passed.
Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run ruff check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py: passed.

Merge Recommendation

Approve. I found no blocking runtime, persistence, plugin-dispatch, or test-coverage issue in the changed boundary, and targeted verification passed.

Review-Metadata:
verdict: APPROVE
head_sha: 571f3b9
request_id: req_1781967196_136
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true

Reviewed by ouroboros-agent[bot] via Codex deep analysis

Q00 · 2026-06-20T14:58:04Z

Pushed follow-up commit 8734e453 after CI's Ruff format job failed on the new regression-test block. No behavior changes; local verification now includes format check:

SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q (20 passed)
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run ruff check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py (passed)
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run ruff format --check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py (passed)

Posted by agentos-roadmap-warden — bot. Reply with /warden ignore to suppress further comments on this thread.

ouroboros-agent

Review — ouroboros-agent[bot]

Verdict: APPROVE

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1492 |
| HEAD checked | 8734e4535f7567da8e2e9e301539a240edde748e |
| Request ID | req_1781967773_137 |
| Review record | 80c5bf03-ec45-4a4e-a932-8dd8a3a2bf3f |

What Improved

Evaluation now resolves one project root for both Stage 1 commands and Stage 2 source visibility.
Omitted working_dir now falls back through brownfield default, seed metadata, then MCP server cwd.
Tests cover explicit override, brownfield precedence, stale/non-directory brownfield fallback, seed metadata, escaped seed metadata, and cwd fallback.

Issue Requirements

Requirement	Status
Ensure omitted evaluate `working_dir` can fall back to the registered brownfield default	Met
Fall back from unusable brownfield defaults to seed metadata	Met
Use containment-aware seed project-path resolution for seed-derived fallback	Met
Document that `working_dir` controls Stage 1 and Stage 2 visibility	Met
Add regression coverage for precedence and fallback behavior	Met

Prior Findings Status

Prior bot reviews existed, including earlier changes-requested states and a later approval. I rechecked the current source independently; the stale/non-directory brownfield fallback and seed containment cases are now covered by current tests, so no prior blocker is maintained.

Blockers

No in-scope blocking findings remained after policy filtering.

Follow-up Findings

#	File:Line	Priority	Confidence	Suggestion
1	skills/evaluate/SKILL.md:76	Low	Medium	The skill example still documents only singular `acceptance_criterion`; documenting `acceptance_criteria` would make the newer checklist path easier to discover.

Non-blocking Suggestions

Test Coverage Notes

Reviewed changed source, diff, PR comments, prior review metadata, and affected path-resolution contracts.
Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q: 20 passed.
Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run ruff check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py: passed.
Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run ruff format --check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py: passed.

Design Notes

The change keeps the fallback logic centralized in the MCP handler and reuses the shared seed path resolver instead of adding a parallel path parser. The runtime split between plugin dispatch and in-process evaluation now receives the same resolved root.

Design / Roadmap Gate

Affected boundary: ouroboros_evaluate working-directory resolution across plugin/subagent dispatch, mechanical verification, artifact collection, and seed metadata recovery. Current HEAD preserves explicit operator override, validates brownfield defaults as existing directories, applies contained seed metadata, and keeps cwd as the final compatibility fallback. The added tests exercise the critical precedence and stale-path cases.

Directional Notes

Review focus followed the maintainer posture around runtime boundaries, fallback ordering, stale project state, and duplicate path-resolution logic. Memory was advisory only; no blocker was raised from memory.

Test Coverage

Reviewed changed source, diff, PR comments, prior review metadata, and affected path-resolution contracts.
Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q: 20 passed.
Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run ruff check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py: passed.
Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run ruff format --check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py: passed.

Merge Recommendation

Approve. I found no current HEAD blocker, and the targeted runtime and formatting checks pass.

Review-Metadata:
verdict: APPROVE
head_sha: 8734e45
request_id: req_1781967773_137
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true

Reviewed by ouroboros-agent[bot] via Codex deep analysis

Q00 · 2026-06-20T15:18:13Z

Pushed follow-up commit 621f9224 after the full test matrix exposed the missing capability metadata expectation for evaluate working_dir context.

Additional local verification:

SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run python -m pytest tests/unit/orchestrator/test_capabilities.py::test_each_ouroboros_owned_tool_metadata_is_tool_specific_not_generic_inference tests/unit/orchestrator/test_capabilities.py::test_ouroboros_owned_tools_have_explicit_capability_metadata tests/unit/orchestrator/test_capabilities.py::test_ouroboros_capability_registry_covers_every_owned_tool_identifier -q (31 passed)
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q (20 passed)
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run ruff check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py tests/unit/orchestrator/test_capabilities.py (passed)
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run ruff format --check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_evaluate_multi_ac.py tests/unit/orchestrator/test_capabilities.py (passed)

Posted by agentos-roadmap-warden — bot. Reply with /warden ignore to suppress further comments on this thread.

ouroboros-agent

Review — ouroboros-agent[bot]

Verdict: REQUEST_CHANGES

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1492 |
| HEAD checked | 621f9224d73294e3b2a9feae932643dfd344ef64 |
| Request ID | req_1781968981_140 |
| Review record | 50bf5252-8d4e-4d97-99e8-339b2b110e1a |

What Improved

ouroboros_evaluate now resolves omitted working_dir through explicit arg, brownfield default, seed metadata, then MCP cwd for the direct/blocking path.
The evaluate skill documents that working_dir affects both Stage 1 commands and Stage 2 source visibility.
Regression tests cover direct evaluate fallback precedence and capability metadata expectations.

Issue Requirements

Requirement	Status
Document `working_dir` as controlling Stage 1 command execution and Stage 2 source visibility	Met
Omitted direct `ouroboros_evaluate` `working_dir` falls back to brownfield default, seed project metadata, then MCP cwd	Met
Ignore stale/non-directory brownfield defaults before seed fallback	Met
Keep capability metadata aware of evaluate `working_dir` context	Met
Apply the same omitted-`working_dir` fallback across evaluate MCP entrypoints, including start/plugin dispatch	Not met

Prior Findings Status

Prior bot review text was not reused. Human comments indicate earlier direct-evaluate fallback blockers were addressed; current source and tests support that for EvaluateHandler. This review maintains a blocker only for the sibling StartEvaluateHandler plugin boundary still bypassing the new resolver.

Blockers

#	File:Line	Severity	Finding
1	src/ouroboros/mcp/tools/evaluation_handlers.py:1871	BLOCKING	`StartEvaluateHandler` still bypasses the new fallback resolver in plugin mode. Its definition reuses `EvaluateHandler().definition.parameters`, so `ouroboros_start_evaluate` now exposes the same `working_dir` contract, but the plugin-dispatch branch returns before the inner `EvaluateHandler.handle()` runs and forwards only `arguments.get("working_dir")`. When callers omit `working_dir`, plugin-mode start-evaluate payloads get `None` instead of the brownfield default, seed project metadata, or cwd that direct `ouroboros_evaluate` now computes at lines 519 and 528. This leaves Stage 2 source visibility wrong for the documented long-running/plugin evaluate boundary.

Follow-up Findings

#	File:Line	Priority	Confidence	Suggestion
1	src/ouroboros/mcp/tools/evaluation_handlers.py:113	Medium	Medium	Consider adding a regression for explicit `working_dir` that points to a missing or non-directory path, so the intended behavior is locked down instead of inherited from downstream best-effort failures.

Non-blocking Suggestions

| 1 | src/ouroboros/mcp/tools/evaluation_handlers.py:118 | Cleanup | _default_brownfield_project_dir() already filters unusable defaults, so the repeated is_dir() branch in _resolve_evaluate_working_dir() is defensive but currently redundant. |

Test Coverage Notes

Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q: 20 passed.
Ran targeted capability metadata tests: 31 passed.
Ran ruff check and ruff format --check on changed source/tests: passed.
Missing coverage: ouroboros_start_evaluate plugin-mode fallback path.

Design Notes

The resolver is placed correctly for direct evaluation before plugin and in-process dispatch, but the start-evaluate plugin shortcut now diverges from the shared contract.

Design / Roadmap Gate

Affected boundary: evaluate working-directory resolution feeds mechanical config, artifact collection, semantic source visibility, and plugin payload context. Direct and background non-plugin paths use the new resolver, but ouroboros_start_evaluate plugin mode returns before that shared resolution and preserves a stale raw-argument path.

Directional Notes

Review focus was on runtime boundaries, fallback precedence, plugin dispatch, and capability metadata. Maintainer memory was advisory only; the blocker is based on current source evidence.

Test Coverage

Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py -q: 20 passed.
Ran targeted capability metadata tests: 31 passed.
Ran ruff check and ruff format --check on changed source/tests: passed.
Missing coverage: ouroboros_start_evaluate plugin-mode fallback path.

Merge Recommendation

REQUEST_CHANGES until StartEvaluateHandler plugin dispatch resolves working_dir with the same fallback logic, with a regression test covering omitted working_dir plus brownfield or seed-derived project root.

Review-Metadata:
verdict: REQUEST_CHANGES
head_sha: 621f922
request_id: req_1781968981_140
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true

Reviewed by ouroboros-agent[bot] via Codex deep analysis

Q00 · 2026-06-20T15:33:22Z

Pushed follow-up commit d55f5c02 to address the latest review blocker: ouroboros_start_evaluate plugin dispatch now uses the same omitted-working_dir fallback resolver before building the subagent payload, with a regression for brownfield-default fallback.

Additional local verification:

SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run python -m pytest tests/unit/mcp/tools/test_start_evaluate.py tests/unit/mcp/tools/test_evaluate_multi_ac.py -q (36 passed)
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run python -m pytest tests/unit/orchestrator/test_capabilities.py::test_each_ouroboros_owned_tool_metadata_is_tool_specific_not_generic_inference tests/unit/orchestrator/test_capabilities.py::test_ouroboros_owned_tools_have_explicit_capability_metadata tests/unit/orchestrator/test_capabilities.py::test_ouroboros_capability_registry_covers_every_owned_tool_identifier -q (31 passed)
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run ruff check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_start_evaluate.py tests/unit/mcp/tools/test_evaluate_multi_ac.py tests/unit/orchestrator/test_capabilities.py (passed)
SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 uv run ruff format --check src/ouroboros/mcp/tools/evaluation_handlers.py tests/unit/mcp/tools/test_start_evaluate.py tests/unit/mcp/tools/test_evaluate_multi_ac.py tests/unit/orchestrator/test_capabilities.py (passed)

Posted by agentos-roadmap-warden — bot. Reply with /warden ignore to suppress further comments on this thread.

ouroboros-agent

Review — ouroboros-agent[bot]

Verdict: APPROVE

Metadata

| Field | Value |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.

---|---|
| PR | #1492 |
| HEAD checked | d55f5c02f2e5a9ce62abf732de4bb2353d54dd1d |
| Request ID | req_1781969887_142 |
| Review record | 92b4cead-e816-4199-b861-f96f8fa9c29b |

What Improved

ouroboros_evaluate now resolves omitted working_dir from brownfield default, seed metadata, then MCP cwd.
ouroboros_start_evaluate plugin dispatch now uses the same fallback before building the subagent payload.
Evaluate skill docs now state that working_dir gates both Stage 1 commands and Stage 2 source visibility.

Issue Requirements

Requirement	Status
Document that evaluate `working_dir` controls Stage 1 command execution and Stage 2 source visibility	Met
When evaluate `working_dir` is omitted, fall back to registered brownfield default before cwd	Met
If brownfield default is stale or not a directory, continue to seed metadata/cwd fallback	Met
Apply omitted-`working_dir` fallback to `ouroboros_start_evaluate` plugin dispatch payloads	Met
Keep capability metadata aware of evaluate `working_dir` context	Met
Linked issue requirement captured	N/A

Prior Findings Status

Prior review concerns are not re-raised. Current HEAD shows the seed-derived fallback now uses the shared containment-aware resolver, stale/non-directory brownfield defaults fall through, and start_evaluate plugin dispatch resolves working_dir before payload construction.

Blockers

No in-scope blocking findings remained after policy filtering.

Follow-up Findings

#	File:Line	Priority	Confidence	Suggestion
None.

Non-blocking Suggestions

| - | - | - | None. |

Test Coverage Notes

Reviewed changed tests for evaluate working-dir fallback, stale/non-directory brownfield defaults, seed metadata containment fallback, start_evaluate plugin payload, and capability metadata.
Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py tests/unit/mcp/tools/test_start_evaluate.py -q — 36 passed.
Ran targeted capability metadata pytest — 31 passed.
Ran Ruff check and Ruff format check on changed source/tests — passed.

Design Notes

The change centralizes evaluate project-root fallback in a small helper and reuses existing project-path containment logic for seed-derived paths. The MCP handlers remain thin enough, with behavior guarded by focused unit tests.

Design / Roadmap Gate

Affected boundary is evaluate project-root selection. Explicit working_dir remains highest precedence; omitted values resolve through brownfield default, seed metadata, then cwd. Brownfield defaults are required to exist as directories. Seed metadata resolution is containment-aware through resolve_seed_project_path. Plugin and in-process evaluate paths now receive the resolved directory string/path before dispatch or pipeline execution.

Directional Notes

Review focused on runtime boundary safety: stale operator defaults, seed metadata containment, plugin dispatch payload parity, and Stage 1/Stage 2 project-root consistency. Maintainer memory was advisory only; no blocker was based on memory alone.

Test Coverage

Reviewed changed tests for evaluate working-dir fallback, stale/non-directory brownfield defaults, seed metadata containment fallback, start_evaluate plugin payload, and capability metadata.
Ran SETUPTOOLS_SCM_PRETEND_VERSION=0.0.0 /root/.local/bin/uv run python -m pytest tests/unit/mcp/tools/test_evaluate_multi_ac.py tests/unit/mcp/tools/test_start_evaluate.py -q — 36 passed.
Ran targeted capability metadata pytest — 31 passed.
Ran Ruff check and Ruff format check on changed source/tests — passed.

Merge Recommendation

Approve. Current HEAD covers the previously risky evaluate working-dir fallback and plugin dispatch boundary with passing targeted tests and lint/format checks.

Review-Metadata:
verdict: APPROVE
head_sha: d55f5c0
request_id: req_1781969887_142
review_profile: memory-aware-zero-trust-v2
advisory_memory_only: true

Reviewed by ouroboros-agent[bot] via Codex deep analysis

fix(evaluate): resolve project working dir fallback

b4e0f24

This was referenced Jun 20, 2026

evaluate: working_dir undocumented, brownfield default not used as fallback #1491

Closed

Meta SSOT: AgentOS roadmap sequencing (#920–#960) #961

Open

ouroboros-agent Bot requested changes Jun 20, 2026

View reviewed changes

fix(evaluate): contain seed working-dir fallback

5b99937

ouroboros-agent Bot requested changes Jun 20, 2026

View reviewed changes

fix(evaluate): skip unusable brownfield defaults

571f3b9

ouroboros-agent Bot approved these changes Jun 20, 2026

View reviewed changes

style(evaluate): format fallback regression tests

8734e45

ouroboros-agent Bot approved these changes Jun 20, 2026

View reviewed changes

test(evaluate): declare working dir capability context

621f922

ouroboros-agent Bot requested changes Jun 20, 2026

View reviewed changes

fix(evaluate): resolve start plugin working dir fallback

d55f5c0

ouroboros-agent Bot approved these changes Jun 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(evaluate): resolve project working dir fallback#1492

fix(evaluate): resolve project working dir fallback#1492
Q00 wants to merge 6 commits into
mainfrom
warden/issue-1490-evaluate-working-dir

Q00 commented Jun 20, 2026

Uh oh!

ouroboros-agent Bot left a comment

Uh oh!

Q00 commented Jun 20, 2026

Uh oh!

ouroboros-agent Bot left a comment

Uh oh!

Q00 commented Jun 20, 2026

Uh oh!

ouroboros-agent Bot left a comment

Uh oh!

Q00 commented Jun 20, 2026

Uh oh!

ouroboros-agent Bot left a comment

Uh oh!

Q00 commented Jun 20, 2026

Uh oh!

ouroboros-agent Bot left a comment

Uh oh!

Q00 commented Jun 20, 2026

Uh oh!

ouroboros-agent Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Q00 commented Jun 20, 2026

Summary

Test plan

Uh oh!

ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

Review — ouroboros-agent[bot]

Metadata

What Improved

Issue Requirements

Prior Findings Status

Blockers

Follow-up Findings

Non-blocking Suggestions

Test Coverage Notes

Design Notes

Design / Roadmap Gate

Directional Notes

Test Coverage

Merge Recommendation

Uh oh!

Q00 commented Jun 20, 2026

Uh oh!

ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

Review — ouroboros-agent[bot]

Metadata

What Improved

Issue Requirements

Prior Findings Status

Blockers

Follow-up Findings

Non-blocking Suggestions

Test Coverage Notes

Design Notes

Design / Roadmap Gate

Directional Notes

Test Coverage

Merge Recommendation

Uh oh!

Q00 commented Jun 20, 2026

Uh oh!

ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

Review — ouroboros-agent[bot]

Metadata

What Improved

Issue Requirements

Prior Findings Status

Blockers

Follow-up Findings

Non-blocking Suggestions

Test Coverage Notes

Design Notes

Design / Roadmap Gate

Directional Notes

Test Coverage

Merge Recommendation

Uh oh!

Q00 commented Jun 20, 2026

Uh oh!

ouroboros-agent Bot left a comment

Choose a reason for hiding this comment

Review — ouroboros-agent[bot]

Metadata

What Improved

Issue Requirements

Prior Findings Status

Blockers

Follow-up Findings

Non-blocking Suggestions

Test Coverage Notes

Design Notes

Design / Roadmap Gate

Directional Notes

Test Coverage

Merge Recommendation

Uh oh!

Q00 commented Jun 20, 2026