Skip to content

evaluate: working_dir undocumented, brownfield default not used as fallback #1491

@thisisjun786

Description

@thisisjun786

Bug: evaluate cannot access project source files without explicit working_dir

Summary

OOO evaluate's Stage 2 sandbox cannot access project source files unless working_dir is explicitly set. Without it, .ouroboros_eval_artifact.md only contains a file path pointer, and the evaluator can only see ~/.hermes/ config files — causing Stage 2 to reject all ACs with "source code not present."

Root Causes (3 issues)

  1. evaluate SKILL.md missing working_dir parameter — The documented MCP tool signature in skills/evaluate/SKILL.md omits working_dir entirely, even though it's a required parameter for correct sandbox file visibility.

  2. working_dir described as "Stage 1 only" — In the tool definition, working_dir is documented as affecting only Stage 1 (mechanical verification), but in practice it controls the entire sandbox's file visibility, including what Stage 2 can see.

  3. Brownfield default / seed metadata not used as fallback — A brownfield-registered project with a default and a seed with project metadata should automatically set working_dir for evaluate, but this fallback path is missing.

Reproduction

# Greenfield project, brownfield-registered
ouroboros_start_evaluate(
  session_id="...",
  artifact="eval_output.txt",
  seed_content="...",
  acceptance_criteria=[...]
  # working_dir NOT set
)
# Result: 0/6 ACs — evaluator can't find project files
# Fix: add working_dir="/path/to/project"

Impact

  • Greenfield projects: evaluate returns REJECTED (0/6) regardless of actual code quality
  • Users resort to trial-and-error (9 attempts in our case) to discover working_dir
  • The timeout/retry loop burns significant LLM tokens

Suggested Fix

  1. Add working_dir to evaluate SKILL.md documentation with clear description
  2. Correct the tool definition to reflect that working_dir affects ALL stages, not just Stage 1
  3. Implement fallback: if working_dir is unset, use brownfield default → seed metadata project_dir → session cwd

Environment

  • OOO version: v0.42.4
  • Runtime: GJC
  • Project type: greenfield, Python 3.12, FastMCP

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugReproducible defect or broken behaviorneeds-humanAutomated fix failed, needs human implementation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions