Skip to content

Commit c842da8

Browse files
committed
2 parents 4557b1e + 2c72ccb commit c842da8

1 file changed

Lines changed: 2 additions & 1 deletion

File tree

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
1-
![version](https://img.shields.io/badge/version-v0.0.2-orange)
21
# xFail — Model Autopsy Benchmark
32

43
![xFail banner](assets/X.Fail.png)
54

65
A focused evaluation harness built to expose the real failure modes of LLM code reasoning. This isn’t a pass/fail scoreboard; it’s a diagnostic layer for models that are pretending to understand requirements.
76

7+
![version](https://img.shields.io/badge/version-v0.0.5-orange)
8+
89
## Why xFail?
910

1011
Benchmarks like HumanEval, MBPP, and SWE-Bench measure surface accuracy. xFail is designed to classify failure behavior and tie it to concrete model breakdowns.

0 commit comments

Comments
 (0)