-
Notifications
You must be signed in to change notification settings - Fork 93
feat(eval): retrieval-quality eval harness for code_index #868
Copy link
Copy link
Open
Labels
code-agentCode agent changesCode agent changesdomain:qualityTests, CI/CD, security, performance, evalsTests, CI/CD, security, performance, evalsenhancementNew feature or requestNew feature or requestevalEvaluation framework changesEvaluation framework changestrack:platformFoundation that both consumer-app and oem-pc tracks consumeFoundation that both consumer-app and oem-pc tracks consume
Milestone
Metadata
Metadata
Assignees
Labels
code-agentCode agent changesCode agent changesdomain:qualityTests, CI/CD, security, performance, evalsTests, CI/CD, security, performance, evalsenhancementNew feature or requestNew feature or requestevalEvaluation framework changesEvaluation framework changestrack:platformFoundation that both consumer-app and oem-pc tracks consumeFoundation that both consumer-app and oem-pc tracks consume
Context
Reviewer on #721 asked for testing evidence on large repos and retrieval quality. The PR ships unit tests covering parser correctness and SDK behaviour, but there is no end-to-end retrieval-quality eval.
Goal
A
gaia eval code-indexsubcommand that runs a curated query set against a reference repository (e.g. gaia itself), measures recall@k / MRR, and outputs a report consumable bygaia report.Acceptance
src/gaia/eval/alongside existingfix-code/agentevals.tests/fixtures/.docs/so regressions are visible.gaia report.References
src/gaia/eval/— existing eval frameworkdocs/reference/eval.mdx— framework docssrc/gaia/code_index/sdk.py— SDK under testDeferred from #721.