Skip to content

Post-merge trust fixes: .gitattributes for SHA pinning, cells.jsonl drift, README/AUDIT consistency#19

Merged
Lightheartdevs merged 1 commit into
mainfrom
submit/hardware-tests-q8-fleet-trust-fixes-2026-05-17
May 17, 2026
Merged

Post-merge trust fixes: .gitattributes for SHA pinning, cells.jsonl drift, README/AUDIT consistency#19
Lightheartdevs merged 1 commit into
mainfrom
submit/hardware-tests-q8-fleet-trust-fixes-2026-05-17

Conversation

@Lightheartdevs

Copy link
Copy Markdown
Contributor

Why

Post-merge audit of #18 flagged five concrete trust hits. Each is a real reproducibility / consistency issue. All five fixed here.

The five fixes

1. .gitattributes added at repo root

Without explicit rules, Windows reproducers checking out this repo with core.autocrlf=true (the default) silently convert JSONL line endings LF→CRLF on checkout, breaking the published SHA on workloads/prompts.jsonl. The reviewer's downloaded file genuinely didn't match the published hash. Now pinned:

  • *.jsonl, power.csv, thermals.csv, *.logbinary (no transformation ever; SHAs match across platforms)
  • *.md, *.yaml, *.json, *.csv (other than power/thermals), *.sh, *.py, *.ts, *.sha256text eol=lf

2. prompts.jsonl.sha256 in sha256sum --check format

Previously just the bare hash (9a27e...). Now standard <hash> <filename> form so sha256sum --check prompts.jsonl.sha256 works directly. Applied to all four sha256 files in the bundle + harness:

$ sha256sum --check prompts.jsonl.sha256
prompts.jsonl: OK
$ sha256sum --check smoke-prompts.jsonl.sha256
smoke-prompts.jsonl: OK

3. cells.jsonl drift fixed

The v1 aggregate/cells.jsonl had 35 rows for tower2/qwen3.6-27b/cuda while filesystem + manifest.json both said 36. Missing: ctx32768_gen2048_conc8 — the engine-bound timeout cell where only 1 of 8 slots completed per batch. The harness aggregator silently drops rows with per_slot_decode_tps_mean=null, which is exactly what that cell produced.

Row backfilled with:

  • aggregate_decode_tps_mean=0.87 and batch_wall_s_mean=1200.10 from cell.json
  • cold_start_decode_tps=1.52, cold_start_wall_s=1200.22
  • power_w_silicon_mean=183.15, power_w_silicon_max=390.79 derived from this cell's power.csv (gpu0 filter, n=11419 samples)
  • temp_c_max=57.0 from this cell's thermals.csv (gpu0 sensor)
  • All per_slot_* fields explicitly null
  • A notes field explaining the null + the reconstruction method

After fix: cells.jsonl 36 rows ↔ filesystem 36 dirs ↔ manifest.json 36 cells, internally consistent.

4. README/AUDIT.md drift on llama-server-*.log publication resolved

README.md said the per-cell llama-server-*.log debug logs are not included (correct — they're 40 MB across 251 files and excluded for bundle size). AUDIT.md said they are published in the reproducibility bundle and listed them as a per-cell artifact. Resolved AUDIT.md to README's position throughout, including a new explicit:

NOT included in the bundle (regeneratable):

  • Per-cell llama-server-<port>.log — excluded to keep the bundle ~110 MB; regeneratable from the pinned SHA + per-cell cell.meta.json server invocation.
  • Per-host build-<backend>.configure.log and build-<backend>.build.log — excluded for size; the build invocations themselves are in harness/HARNESS-README.md.

5. Harness docs de-drifted

harness/README.md still described Strix Halo as "ROCm canonical" with the grid "running twice (ROCm and Vulkan)" — pre-bug-discovery framing that contradicted the bundle's findings.md (Vulkan is canonical, ROCm 6.4.4 segfaulted — see Finding 1). Updated the hosts table, workload-size estimate, and engines/ inventory comments to reflect Vulkan-canonical reality.

harness/AUDIT.md was a full duplicate copy of the upstream bench-fleet AUDIT, with several pre-curation internal references (task #16, #20, #21, #37; "the user"; targets.json.broken_rocm_finding) that were sanitized in the bundle-level ../AUDIT.md during PR #18 but not propagated to the harness copy. Replaced with a single-paragraph pointer to ../AUDIT.md as single source of truth, so future curation passes have one file to update instead of two that can drift.

What this PR is NOT

No new bench data. The next round (MMBT Phase B Q8 quality companion, full 30-min sustained-thermal tier, PTX-JIT SOFT_MAX retry on Tower2 35B-A3B native CUDA) ships as a separate PR.

Verification

$ cd hardware-tests/qwen3.6-q8-fleet-2026-05-17/workloads
$ sha256sum --check prompts.jsonl.sha256
prompts.jsonl: OK

$ python3 -c "
import json
counts = {}
for line in open('hardware-tests/qwen3.6-q8-fleet-2026-05-17/aggregate/cells.jsonl'):
    r = json.loads(line)
    k = (r['host'], r['model'], r['backend'])
    counts[k] = counts.get(k, 0) + 1
print(counts[('tower2', 'qwen3.6-27b', 'cuda')])
"
36   # was 35, now matches manifest + filesystem

Test plan

  • On Windows, git clone and verify sha256sum --check prompts.jsonl.sha256 passes (no CRLF conversion)
  • Spot-check the backfilled ctx32768_gen2048_conc8 row in aggregate/cells.jsonl against tower2/qwen3.6-27b/cuda/ctx32768_gen2048_conc8/cell.json
  • Confirm AUDIT.md no longer claims llama-server logs are published
  • Confirm harness/README.md describes Vulkan as Strix canonical
  • Confirm harness/AUDIT.md is a pointer stub to ../AUDIT.md

🤖 Generated with Claude Code

Five concrete reviewer nits flagged on PR #18 post-merge. Each is a real
trust hit; all five are fixed here.

1. **.gitattributes added at repo root.** Without explicit rules, Windows
   reproducers checking out this repo with the default core.autocrlf=true
   silently converted JSONL line endings LF→CRLF on checkout, which broke
   the published SHA on workloads/prompts.jsonl ("the file the reviewer
   downloaded didn't match the SHA we published"). New .gitattributes pins
   *.jsonl, power.csv, thermals.csv, and *.log as `binary` (no
   transformation ever; SHAs match across platforms), forces LF on
   *.md/*.yaml/*.json/*.csv/*.sh/*.py and other text formats.

2. **prompts.jsonl.sha256 fixed to sha256sum --check format.** The file
   previously contained just the bare hash, which fails
   `sha256sum --check prompts.jsonl.sha256` with "no properly formatted
   checksum lines found". Updated to standard `<hash>  <filename>` form so
   reproducers can verify directly. Applied to all four sha256 files:
   workloads/prompts.jsonl.sha256, harness/workloads/prompts.jsonl.sha256,
   harness/workloads/smoke-prompts.jsonl.sha256, and the bundle-level
   workloads/prompts.jsonl.sha256.

3. **cells.jsonl drift fixed.** The v1 aggregate had 35 rows for
   tower2/qwen3.6-27b/cuda while the filesystem (and manifest.json) said
   36. The missing cell was ctx32768_gen2048_conc8 — the canonical
   engine-bound timeout cell where only 1 of 8 slots completed per batch.
   The harness aggregator silently drops rows with
   per_slot_decode_tps_mean=null, which is exactly what that cell
   produced. Backfilled the row with the available aggregate numbers from
   cell.json (aggregate_decode_tps_mean=0.87, batch_wall_s_mean=1200.10,
   cold_start_decode_tps=1.52), power/thermal stats derived from the cell's
   power.csv (gpu0 filter, n=11419) and thermals.csv (gpu0 sensor,
   n=10861), per_slot fields explicitly null, and a `notes` field
   explaining why per_slot is null and how the row was reconstructed.
   manifest.json's 36-cell count now matches aggregate; filesystem
   reality preserved.

4. **README/AUDIT.md drift on llama-server log publication fixed.**
   README.md said llama-server debug logs are NOT included (correct);
   AUDIT.md said they ARE published in the reproducibility bundle and
   listed them as a per-cell artifact. Resolved to README's position
   throughout AUDIT.md, with a new explicit "NOT included in the bundle
   (regeneratable)" section listing the per-cell llama-server log and the
   per-host build-*.log files, both regeneratable from the pinned source
   SHA in harness/VENDORED-FROM-SHA.txt.

5. **harness/README.md + harness/AUDIT.md de-drifted.** harness/README.md
   still described Strix Halo as "ROCm canonical" with the grid "running
   twice (ROCm and Vulkan)" — pre-bug-discovery framing that contradicted
   the bundle's findings.md (Vulkan is the canonical, working path; ROCm
   6.4.4 segfaulted, see Finding 1). Updated the harness/README hosts
   table, the workload-size estimate, and the engines/ inventory comments
   to reflect Vulkan-canonical reality.

   harness/AUDIT.md was a full duplicate copy of the upstream bench-fleet
   AUDIT, with several pre-curation internal references (task #16, #20,
   #21, #37; "the user"; targets.json.broken_rocm_finding) that were
   sanitized in the bundle-level ../AUDIT.md during the curation pass but
   not propagated. Replaced harness/AUDIT.md with a one-paragraph pointer
   to ../AUDIT.md as single source of truth, so future curation passes
   have one file to update instead of two that can drift.

Net effect: the bundle is consistent (manifest ↔ aggregate ↔ filesystem),
the SHA pin is platform-neutral (Linux + Windows + macOS reproducers all
get the same bytes), the README and the AUDIT agree on what's in the
bundle and what isn't, and the harness docs reflect the actually-running
configuration instead of the pre-bug plan.

This commit adds no new bench data. The next round (MMBT Phase B Q8
quality companion, full sustained-thermal tier, PTX-JIT SOFT_MAX retry)
ships as a separate PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Lightheartdevs Lightheartdevs merged commit ebf69ee into main May 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant