Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
cost.json	cost.json
grade.json	grade.json
label.json	label.json
receipt.json	receipt.json
summary.json	summary.json

Name

Last commit message

Last commit date

ci failure debugging — Qwen3-Coder-Next-AWQ

Run name: p2_ci_coder_v1 (1 of 3 — see task family README for the full N=3 picture) Wall: 62.7 s Cost upper: $0.0014 Verdict: PASS

Notes

Clean — same. 1.2 min median wall (faster), 2× cheaper than 27B.

Files

grade.json — verdict + per-dimension scores. Hand-graded subjective dimensions (where the grader has them) are filled in hand_rating_placeholders with _GRADER_: claude-opus-4.7-1m-context.
cost.json — wall, tokens, GPU, energy upper bound
label.json — failure-mode classification per tooling/FAILURE-TAXONOMY.md
summary.json — finish reason, iteration count, total tokens
receipt.json — vLLM args, harness git SHA, GPU snapshot

This is a lean entry. The transcript and deliverable artifacts for this specific run aren't mirrored in MMBT (lean entry — saves repo space). But the task prompt, input starter, ground truth, and grader script for this task family are all in ../../../../tooling/ — with those + your own GPU you can rerun this task family at N=3 yourself. The original bench-side run name was p2_ci_coder_v1. The 3 highest-signal task families (adversarial-hallucination, market-research, doc-synthesis) have full per-model entries with transcripts + deliverables; this one doesn't, to keep the repo size manageable. See ../README.md for the rationale.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

ci failure debugging — Qwen3-Coder-Next-AWQ

Notes

Files

FilesExpand file tree

Qwen3-Coder-Next-AWQ

Directory actions

More options

Directory actions

More options

Latest commit

History

Qwen3-Coder-Next-AWQ

Folders and files

parent directory

README.md

ci failure debugging — Qwen3-Coder-Next-AWQ

Notes

Files