Problem
The eval framework works well for GAIA's built-in scenarios but third parties need clean extension points to add their own use cases without modifying framework internals.
What Third Parties Need
- Custom scenario directory — Drop YAML files in a user directory (e.g.,
~/.gaia/eval/scenarios/) and they're automatically discovered alongside built-in scenarios
- Custom corpus directory — Add documents to
~/.gaia/eval/corpus/ with a local manifest
- Custom scoring dimensions — Add domain-specific scoring beyond the 7 built-in dimensions (e.g., "medical_accuracy" for healthcare, "code_correctness" for developer tools)
- Custom personas — Define new personas beyond the 5 built-in ones
- Scenario tags/filters — Tag scenarios with metadata (use_case, difficulty, provider) for selective runs
- Result export formats — JSON (exists), CSV, JUnit XML (for CI), HTML report
Architecture
Built-in scenarios: eval/scenarios/*.yaml (shipped with GAIA)
User scenarios: ~/.gaia/eval/scenarios/*.yaml (auto-discovered)
Built-in corpus: eval/corpus/ (shipped with GAIA)
User corpus: ~/.gaia/eval/corpus/ (auto-discovered)
Built-in prompts: eval/prompts/ (shipped with GAIA)
User prompts: ~/.gaia/eval/prompts/ (overrides built-in)
CLI Extensions
gaia eval agent --scenario-dir ~/my-project/eval/scenarios/
gaia eval agent --corpus-dir ~/my-project/eval/corpus/
gaia eval agent --output-format junit # For CI integration
gaia eval agent --tag healthcare # Run only tagged scenarios
Acceptance Criteria
Problem
The eval framework works well for GAIA's built-in scenarios but third parties need clean extension points to add their own use cases without modifying framework internals.
What Third Parties Need
~/.gaia/eval/scenarios/) and they're automatically discovered alongside built-in scenarios~/.gaia/eval/corpus/with a local manifestArchitecture
CLI Extensions
Acceptance Criteria
~/.gaia/eval/scenarios/are auto-discovered--scenario-dirand--corpus-dirCLI flags work--tag