docs: validity boundaries, ROADMAP, FIELD-REPORTS, quant-contribution path#15
Merged
Merged
Conversation
…on path Phase 4 of docs reorganization — translates the substantive criticism from the LocalLLaMA discussion thread into validity-boundary docs and a contribution-pathway scaffold. No changes to data or methodology; this is purely about making the study's scope explicit and giving external contributors a clear way to fill the gaps it doesn't cover. - README.md + COMPARISON.md: hoisted "operating point" callout (4-bit AWQ Cyankiwi on 2x RTX PRO 6000 Blackwell; other quants/VRAM/hardware not characterized) to entry-point visibility. Stops the "what quant?" and "what about my hardware?" questions before they're asked. - COMPARISON.md: new "What this benchmark doesn't characterize" section before the Drilling-deeper table. Five paragraphs covering: other quants of the same models, other VRAM tiers, other hardware classes (Mac M-series), other languages (Python-only Phase 1), single-rig hardware variance. Pre-empts the most common substantive criticisms by making them part of the doc instead of letting them surface in threads. - KNOWN-LIMITATIONS.md: expanded "Quantization specificity" subsection with a new "Cyankiwi 4-bit AWQ field reports" block. Acknowledges multiple practitioner reports that these specific quants underperform official FP8 / Unsloth UD4 of the same models. Defends the within-quant comparison (still informative) without defending absolute capability claims (not characterized at higher precisions). Commits to FP8 re-run as the validation pass. - microbench-phase-b/findings.md: extended Recommended follow-ups from 3 to 6 items; added FP8 re-run, M-series Mac sibling study, and language-mix expansion. Pointer to the new ROADMAP for the consolidated cross-doc view. - New ROADMAP.md: consolidated open questions and contribution opportunities from across all findings docs. 10 prioritized active follow-ups with [contributor-welcome] flags on the 4 items external contributors can take end-to-end. Replaces the need to read 3 findings docs to know "where can I help?" - tooling/ADDING-A-MODEL.md: added "Two contribution shapes" section at top — clarifies that "same model, different quant" (e.g. official FP8, Unsloth UD4) is a valid contribution path, not just adding a wholly new model. Currently the highest-priority external contribution per ROADMAP. - New FIELD-REPORTS.md: template for collecting voluntary practitioner reports of model behavior on real workflows. Complements the structured benchmark data with anecdotal-but-specific evidence. Initially seeded with the format example; populates as reports come in. What this PR is NOT: a response to specific commenters, a change to the benchmark data, or a folder restructure. The discourse is ephemeral; the docs are permanent. This translates the signal into validity-boundary docs and roadmap items. Broken-link scan: 0/867 (was 0/831 in Phase 3; +36 valid links added). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 4 of the docs reorganization. Translates the substantive criticism from the recent LocalLLaMA discussion thread into validity-boundary docs and a contribution-pathway scaffold. No changes to data or methodology — purely about making the study's scope explicit and giving external contributors a clear way to fill the gaps it doesn't cover.
What this PR is NOT: a response to specific commenters, a change to the benchmark data, or a folder restructure. The discourse is ephemeral; the docs are permanent. This translates the signal into validity-boundary docs and roadmap items.
Test plan
🤖 Generated with Claude Code