- The canonical benchmark surface is the fixed 35-episode surrogate
receptor-response corpus generated by
src/zpe_smell/evaluation.py. validation/results/reference_public_eval.jsonis the committed reference replay for exact-match verification.proofs/artifacts/public_smell_surrogate_scope.jsonis the committed public proof artifact for the bounded smell surface.
This will be populated by the receipt-bundle.yml workflow in Wave 3.
python3 -m venv .venv
. .venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install -e '.[dev]'
python3 -m zpe_smell.reproduce --output validation/results/latest_public_eval.json
python3 -m pytest -q- CPython 3.11 or newer
- Local POSIX shell environment with
python3andvenv