Problem
The legacy eval framework (eval.py at 159.7 KB, groundtruth.py, batch_experiment.py, transcript_generator.py, email_generator.py) still exists alongside the new agent eval framework. This is 9,200+ lines of dead code that:
- Confuses new contributors ("which eval framework do I use?")
- Inflates the codebase
- Has stale CLI commands (
gaia eval, gaia groundtruth, gaia report) that conflict with the new gaia eval agent
What to Remove
Per #573 (the replacement plan):
Files to Remove
src/gaia/eval/eval.py (3,336 lines) — old Evaluator class
src/gaia/eval/groundtruth.py (~1,000 lines) — old ground truth generator
src/gaia/eval/batch_experiment.py (2,367 lines) — old batch runner
src/gaia/eval/transcript_generator.py — not needed
src/gaia/eval/email_generator.py — not needed
src/gaia/eval/fix_code_testbench/ — replaced by eval scenarios
src/gaia/eval/configs/ — old config format
src/gaia/eval/webapp/ — old Express.js visualizer (if superseded)
Files to Keep
src/gaia/eval/runner.py — new AgentEvalRunner ✅
src/gaia/eval/scorecard.py — new scorecard ✅
src/gaia/eval/audit.py — new architecture audit ✅
src/gaia/eval/claude.py — ClaudeClient (Anthropic SDK wrapper) ✅
src/gaia/eval/config.py — MODEL_PRICING + DEFAULT_CLAUDE_MODEL ✅
src/gaia/eval/pdf_document_generator.py → rename to pdf_generator.py ✅
CLI Changes
- Remove:
gaia eval (old), gaia groundtruth, gaia report, gaia create-template, gaia visualize
- Keep:
gaia eval agent (new framework)
Acceptance Criteria
Problem
The legacy eval framework (
eval.pyat 159.7 KB,groundtruth.py,batch_experiment.py,transcript_generator.py,email_generator.py) still exists alongside the new agent eval framework. This is 9,200+ lines of dead code that:gaia eval,gaia groundtruth,gaia report) that conflict with the newgaia eval agentWhat to Remove
Per #573 (the replacement plan):
Files to Remove
src/gaia/eval/eval.py(3,336 lines) — old Evaluator classsrc/gaia/eval/groundtruth.py(~1,000 lines) — old ground truth generatorsrc/gaia/eval/batch_experiment.py(2,367 lines) — old batch runnersrc/gaia/eval/transcript_generator.py— not neededsrc/gaia/eval/email_generator.py— not neededsrc/gaia/eval/fix_code_testbench/— replaced by eval scenariossrc/gaia/eval/configs/— old config formatsrc/gaia/eval/webapp/— old Express.js visualizer (if superseded)Files to Keep
src/gaia/eval/runner.py— new AgentEvalRunner ✅src/gaia/eval/scorecard.py— new scorecard ✅src/gaia/eval/audit.py— new architecture audit ✅src/gaia/eval/claude.py— ClaudeClient (Anthropic SDK wrapper) ✅src/gaia/eval/config.py— MODEL_PRICING + DEFAULT_CLAUDE_MODEL ✅src/gaia/eval/pdf_document_generator.py→ rename topdf_generator.py✅CLI Changes
gaia eval(old),gaia groundtruth,gaia report,gaia create-template,gaia visualizegaia eval agent(new framework)Acceptance Criteria
gaia eval agentremains the single entry point