Add eval-harness to LLMOps section by hoainho · Pull Request #538 · tensorchord/Awesome-LLMOps

hoainho · 2026-06-01T13:12:21Z

Adding eval-harness

Project: https://github.com/nano-step/eval-harness
License: MIT
Language: Bash (+ jq, python3 stdlib)
Released: v0.4.2 on 2026-05-30

What it does

Behavior-regression testing for LLM agents — detects when an agent's behavior drifts from a baseline, attributes the cause across 4 deterministic classes (SKILL_CHANGED / FIXTURE_STALE / MODEL_CHANGED / UNKNOWN_DRIFT), and emits a 6-field FAIL schema with transcript_span + env_delta. Ships a composite GitHub Action and a git pre-push hook.

Why this fits the LLMOps section

LLMOps testing/observability is a known gap — existing tools tell you THAT a test failed but not WHY. eval-harness fills the regression-detection + attribution slice. It composes well with broader entries on this list (LangSmith, Arize-Phoenix, Langfuse, Helicone, etc.) rather than replacing them; honest comparison vs promptfoo: docs/why-not-promptfoo.md.

Distinctive features

4-class failure attribution (deterministic SHA-comparison decision tree)
6-field FAIL schema including transcript_span + env_delta
3-sample byte-identical stability check — first-class flake tagging instead of retry-until-pass
Hard $-cost ceiling with daily budget enforcement (default EVAL_BUDGET_USD=2.00)
Per-(case,trigger) flock lockfile for safe concurrent CI runs

Project hygiene

v0.4.2 closes 8 audit-surfaced BLOCKERs (sandboxed score_shell, fixture path-traversal blocking, GNU/BSD grep portability, etc.)
20/20 test suites green on main
CONTRIBUTING.md, CODE_OF_CONDUCT.md, SECURITY.md present
DCO sign-off on commit
Open good first issue + help wanted labels for contributors

Entry placement

Inserted between Deepchecks and Evidently (alphabetical, case-insensitive).

Thanks for maintaining this list.

Signed-off-by: Hoài Nhớ <nhoxtvt@gmail.com>

Add eval-harness to LLMOps

4726e86

Signed-off-by: Hoài Nhớ <nhoxtvt@gmail.com>

hoainho mentioned this pull request Jun 1, 2026

campaign(2k-stars): community health + broader positioning + docs/ + GitHub Action nano-step/eval-harness#30

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add eval-harness to LLMOps section#538

Add eval-harness to LLMOps section#538
hoainho wants to merge 1 commit into
tensorchord:mainfrom
nano-step:add-eval-harness

hoainho commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hoainho commented Jun 1, 2026

Adding eval-harness

What it does

Why this fits the LLMOps section

Distinctive features

Project hygiene

Entry placement

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant