Skip to content

Latest commit

 

History

History
28 lines (18 loc) · 541 Bytes

File metadata and controls

28 lines (18 loc) · 541 Bytes

03 Eval Harness

Learn a lightweight eval loop for AI outputs.

This example grades candidate responses against simple criteria and prints a pass/fail summary.

What this example teaches

  • Define eval cases as data.
  • Apply deterministic scoring rules.
  • Track pass rate over time.

Run

python3 run.py --cases sample_input/eval_cases.json

Run tests

python3 -m unittest discover -s tests -p "test_*.py"

Sample artifacts

  • Input: sample_input/eval_cases.json
  • Output: sample_output/report.json