Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
__init__.py	__init__.py
eval_llm_response.py	eval_llm_response.py
eval_spec.py	eval_spec.py
harness.py	harness.py

Name

Last commit message

Last commit date

README.md

Spec-Harness

Spec-Harness evaluates four metrics on verifier-accepted specifications:

Metric	What it measures
PostCorrectness	Postcondition holds on valid test pairs
PostCompleteness	Postcondition catches failing test pairs
PreCorrectness	Precondition accepts valid inputs
PreCompleteness	Precondition rejects invalid inputs

Run

Note: responses.jsonl is the output file of running the baselines approaches

Note: --max-pairs is the max input/output test pairs to use in mutation

python -m spec_harness.eval_llm_response \
  --benchmark_path benchmarks/formalbench/fb.json \
  --llm_response_path path/to/responses.jsonl \
  --openjml openjml \
  --output spec_harness_results \
  --threads 8 \
  --max-pairs 5
  --verbose

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Spec-Harness

Run

FilesExpand file tree

spec_harness

Directory actions

More options

Directory actions

More options

Latest commit

History

spec_harness

Folders and files

parent directory

README.md

Spec-Harness

Run