Evidence - Production Mixed-Rate Allocation

Leak-Audit Status

The initial C2 evaluator reused a fixed prompt pool for both calibration and eval. With 256 eval prompts, the 12 calibration prompts were included in the eval set for seeds 6, 7, and 8.

Impact:

calibration-greedy selector claims are contaminated until rerun
weight-MSE selector claims are less directly exposed because the selector is weight-only, but the headline still needs a clean disjoint eval rerun
clean reruns now write a prompt_audit block with prompt hashes and overlap_count: 0

Current authority: public-calibrated runs with prompt-audit blocks. The current frontier is PMRA-021, the Gemma 4 E2B-it knapsack mix. PMRA-022 records the released Huihui Qwen3.5 4B abliterated PMRA mix. The Qwen3 public-calibrated suite in PMRA-018 through PMRA-020 remains the broader cross-corpus support set. PMRA-023 records the completed Ministral 3 8B Instruct mixes and held-out Wikitext test confirmation. PMRA-024 records the Granite 4.1 8B Heretic PMRA mix with explicit Heretic upstream credit and held-out Wikitext test confirmation.

Confirmed Findings

PMRA-001 - Three-Seed 64-Prompt Model-Forward Marker

Qwen3-1.7B mixed production allocation beat uniform Q3_K_M across seeds 6, 7, and 8.

mean NLL improvement vs Q3_K_M: 0.152334
mean tensor-payload saving vs Q3_K_M: 14,084,779 bytes
mean improvement vs same-budget random control: 0.201847 NLL
mean improvement vs same-budget weight-MSE control: 0.035187 NLL

Source:

codex_ladder/runs/run_007/stage5_c2_production_mixed_rate_robustness_result.md

PMRA-002 - Real Mixed GGUF Artifact

Seed 8 mixed allocation was written as a single GGUF and reloaded with zero tensor mismatches.

file size: 1,071,604,128 bytes
payload bytes: 1,065,652,224
metadata/alignment overhead: 5,951,904 bytes
file bpw: 4.219454

Source:

codex_ladder/runs/run_007/stage5_c2_mixed_gguf_artifact_result.md

PMRA-003 - llama.cpp Runtime Load

llama-cli.exe loaded mixed_seed8.gguf and generated text with exit code 0.

This shows the artifact is not merely readable by Python GGUF tooling.

PMRA-004 - 256-Prompt Robustness Held

The larger three-seed eval held the production mixed-rate signal.

Calibration-greedy mean result:

mean NLL improvement vs Q3_K_M: 0.083145
mean NLL improvement vs IQ3_M: 0.287746
mean NLL improvement vs IQ4_XS: 0.099260
mean payload bytes vs Q3_K_M: -14,084,779

Weight-MSE same-budget control was stronger:

seed 6: 0.125789 NLL better than Q3_K_M
seed 7: 0.122095 NLL better than Q3_K_M
seed 8: 0.122303 NLL better than Q3_K_M
payload bytes vs Q3_K_M: -215,040

Source:

codex_ladder/runs/run_007/stage5_c2_eval256_result.md

PMRA-005 - Weight-MSE Mixed Artifact

The stronger 256-prompt weight-MSE selector was materialized as a GGUF and loaded by llama.cpp.

file size: 1,073,027,488 bytes
file bytes vs Q3_K_M: -215,040
payload bpw: 4.201623
llama.cpp load smoke: passed

Caveat: CPU llama.cpp runtime is slower than uniform Q3_K_M.

Source:

codex_ladder/runs/run_007/stage5_c2_weight_mse_artifact_runtime_result.md

Killed Adjacent Branches

C1 Production-Base Residual Overlay

Killed. Side payloads over production baselines worsened both Q3_K_M and IQ3_XS at the tested operating point.

Source:

codex_ladder/runs/run_007/stage5_c1_production_base_residual_overlay_result.md

C5 Simple IQ4 Semantic Erasure

Killed. Simple IQ4 bitplane/subfield erasure caused severe quality collapse and missed the saving target.

Source:

codex_ladder/runs/run_007/stage5_c5_iq4_semantic_erasure_result.md

Open Evidence

PMRA-006 - IQ3_M-Budget Compression Gate

Passed on seeds 6, 7, and 8.

Mean weight-MSE selector result:

NLL improvement vs Q3_K_M: 0.117052
NLL improvement vs IQ3_M: 0.321653
NLL improvement vs random same-budget: 0.213092
payload bytes vs Q3_K_M: -43,958,272
payload bytes vs IQ3_M: -81,920

Source:

codex_ladder/runs/run_007/stage5_c2_iq3m_budget_result.md

PMRA-007 - IQ3_M-Budget Artifact And Runtime

The lower-budget C2 artifact materialized as a GGUF and loaded in llama.cpp.

file size: 1,029,284,256 bytes
file bytes vs Q3_K_M: -43,958,272
file bytes vs IQ3_M: -81,920
decode benchmark: 9.36 tok/s vs 9.59 tok/s for Q3_K_M
prompt benchmark: 36.70 tok/s vs 59.22 tok/s for Q3_K_M

Source:

codex_ladder/runs/run_007/stage5_c2_iq3m_budget_artifact_runtime_result.md

Open Evidence

PMRA-009 - Clean 1.7B Leak-Audit Rerun

Passed against the original IQ3_M and Q3_K_M controls.

prompt overlap count: 0 on seeds 6, 7, and 8
mean NLL improvement vs IQ3_M: 0.332992
mean NLL improvement vs Q3_K_M: 0.119953
mean NLL improvement vs IQ4_XS: 0.145736
mean NLL improvement vs random same-budget: 0.229526
payload bytes vs Q3_K_M: -43,958,272

Source:

codex_ladder/runs/run_008/stage5_c2_17b_clean_iq3m_budget_result.md

PMRA-010 - Clean Artifact And Runtime

Clean seed 8 was materialized as a GGUF and loaded in llama.cpp.

file size: 1,029,284,256 bytes
payload bytes: 1,023,332,352
tensor reload mismatches: 0
prompt speed: 29.73 tok/s
decode speed: 7.89 tok/s

Caveat: Q3_K_S is a stronger production control than expected. It is smaller and faster than the clean PMRA artifact in local llama.cpp benchmarks.

Source:

codex_ladder/runs/run_008/stage5_c2_clean_artifact_runtime_result.md

PMRA-011 - Q3_K_S-Target Stronger-Control Gate

The tensor-level PMRA effect held against Q3_K_M, IQ4_XS, and random same-budget allocation, but it did not beat uniform Q3_K_S at the same Q3_K_S payload budget.

prompt overlap count: 0 on seeds 6, 7, and 8
mean NLL delta vs Q3_K_S: -0.190351
mean NLL improvement vs Q3_K_M: 0.041247
mean NLL improvement vs IQ4_XS: 0.067031
mean NLL improvement vs random same-budget: 0.170860
payload bpw: 3.917842, equal to uniform Q3_K_S

The matching-HF guard using Qwen/Qwen3-1.7B sharded safetensors reproduced the same numbers, so the miss is not explained by the earlier HF-reference mismatch.

Decision: current tensor-level PMRA is real but blocked as a deployable headline by the stronger public Q3_K_S baseline unless sub-q3 stacking changes the frontier.

Source:

codex_ladder/runs/run_008/stage6_c2_q3ks_target_result.md

PMRA-012 - Sub-Q3 IQ2_M to IQ3_XS Gate

Sub-q3 calibration-greedy PMRA beat uniform IQ3_XS while staying below IQ3_XS payload bytes on all three clean seeds.

prompt overlap count: 0 on seeds 6, 7, and 8
mean NLL improvement vs IQ3_XS: 0.412315
mean NLL improvement vs Q3_K_S: 0.018464
mean NLL improvement vs random same-budget: 0.231937
mean payload bytes vs IQ3_XS: -11,782,827
mean payload bytes vs Q3_K_S: -44,812,971
mean payload bpw: 3.741390

Decision: promote seed 8 calibration-greedy PMRA to artifact/runtime validation. Caveat: seed 6 is GRAY because random same-budget allocation was too close/better, so this is not publication-ready until artifact and broader validation pass.

Source:

codex_ladder/runs/run_008/stage7_c2_subq3_iq2m_to_iq3xs_result.md

PMRA-013 - Sub-Q3 Artifact And Runtime

Seed 8 calibration-greedy sub-q3 PMRA was materialized as a GGUF and loaded in llama.cpp.

file size: 959,521,184 bytes
payload bytes: 953,569,280
tensor reload mismatches: 0
seed 8 NLL improvement vs IQ3_XS: 0.439747
seed 8 NLL improvement vs Q3_K_S: 0.057056
payload bytes vs IQ3_XS: -8,404,992
payload bytes vs Q3_K_S: -41,435,136
prompt speed: 37.79 tok/s vs 19.45 tok/s for IQ3_XS
decode speed: 9.36 tok/s vs 7.68 tok/s for IQ3_XS

Decision: keep going. A larger 1024-prompt confirmation run is in progress before any public claim.

Source:

codex_ladder/runs/run_008/stage8_c2_subq3_artifact_runtime_result.md

PMRA-014 - Sub-Q3 1024-Prompt Confirmation

The sub-q3 signal survived a larger held-out eval.

eval prompts: 1024
calibration prompts: 12
prompt overlap count: 0 on seeds 6, 7, and 8
mean NLL improvement vs IQ3_XS: 0.420092
mean NLL improvement vs Q3_K_S: 0.022041
mean NLL improvement vs random same-budget: 0.236058
mean payload bytes vs IQ3_XS: -11,782,827
mean payload bytes vs Q3_K_S: -44,812,971

Decision: quality/size survived, selector did not fully clear. Seed 6 remains GRAY because random same-budget allocation beat calibration-greedy on that seed. Next step is selector hardening, starting with a larger calibration budget on the failure seed.

Source:

codex_ladder/runs/run_008/stage9_c2_subq3_eval1024_confirmation.md

PMRA-015 - Sub-Q3 Calib48 Production-Shaped Pass

The larger calibration selector fixed the seed 6 random-control failure and produced a stronger artifact candidate.

model: Qwen/Qwen3-1.7B
prompt overlap count: 0 on seeds 6, 7, and 8
mean NLL improvement vs IQ3_XS: 0.869488
mean NLL improvement vs Q3_K_S: 0.467529
mean NLL improvement vs random same-budget: 0.502104
mean payload bytes vs IQ3_XS: -64,160,427
mean payload bytes vs Q3_K_S: -97,190,571
mean payload bpw: 3.535153

Seed 7 was materialized as a GGUF:

file size: 896,601,344 bytes
payload bytes: 890,648,576
payload bpw: 3.506939
tensor reload mismatches: 0
PMRA metadata fields: 11
prompt speed: 33.58 tok/s
decode speed: 10.85 tok/s

Same local runtime comparison:

Q3_K_S: 26.97 prompt tok/s, 7.27 decode tok/s
IQ3_XS: 12.27 prompt tok/s, 7.99 decode tok/s

Source:

codex_ladder/runs/run_008/stage10_c2_subq3_calib48_result.md

PMRA-016 - Sub-Q3 Calib48 Second-Size Replication

The same sub-q3 PMRA gate passed on Qwen/Qwen3-0.6B-Base.

prompt overlap count: 0 on seeds 6, 7, and 8
mean NLL improvement vs IQ3_XS: 1.127286
mean NLL improvement vs Q3_K_S: 0.507815
mean NLL improvement vs random same-budget: 1.121200
mean payload bytes vs IQ3_XS: -13,607,936
mean payload bytes vs Q3_K_S: -23,929,856
mean payload bpw: 3.832145

Decision: replication supported promoting PMRA to a production-shaped method. It did not yet establish cross-family generality.

Source:

codex_ladder/runs/run_008/stage11_c2_subq3_qwen06_calib48_replication_result.md

PMRA-017 - Frozen Selector Public Wikitext Eval

The seed 7 sub-q3 PMRA selection was frozen and evaluated on Wikitext-2 raw test chunks without rerunning allocation on public data.

Qwen3-1.7B:

status: NO-GO
NLL improvement vs IQ3_XS: -0.019426
NLL improvement vs Q3_K_S: 0.049109
NLL improvement vs random same-budget: 0.094621
payload bytes vs IQ3_XS: -71,325,696
payload bytes vs Q3_K_S: -104,355,840

Qwen3-0.6B-Base:

status: NO-GO
NLL improvement vs IQ3_XS: -0.118486
NLL improvement vs Q3_K_S: -0.202373
NLL improvement vs random same-budget: 0.282116
payload bytes vs IQ3_XS: -8,474,624
payload bytes vs Q3_K_S: -18,796,544

Decision: do not publish the frozen project-local selector as a broad quality claim. The mechanism remains live because both public runs beat same-budget random allocation and the 1.7B result still beats Q3_K_S while materially smaller, but release now depends on a public-calibrated held-out selector.

Source:

results/run_008/stage12_c2_subq3_public_wikitext_eval.md

PMRA-018 - Public-Calibrated PMRA Release Candidate

Public calibration resolved the frozen-selector Wikitext transfer failure.

Qwen3-1.7B, c2_calib_greedy_mixed, Wikitext train -> validation -> frozen test:

validation NLL improvement vs IQ3_XS: 0.156058
validation NLL improvement vs Q3_K_S: 0.223327
validation NLL improvement vs same-budget random: 0.229276
test NLL improvement vs IQ3_XS: 0.145818
test NLL improvement vs Q3_K_S: 0.214353
test NLL improvement vs same-budget random: 0.217721
payload bytes vs IQ3_XS: -6,232,064
payload bytes vs Q3_K_S: -39,262,208

Qwen3-0.6B-Base, c2_calib_weight_blend_mixed:

validation NLL improvement vs IQ3_XS: 0.218488
validation NLL improvement vs Q3_K_S: 0.153261
frozen test NLL improvement vs IQ3_XS: 0.225181
frozen test NLL improvement vs Q3_K_S: 0.141294
payload bytes vs IQ3_XS: -68,608
payload bytes vs Q3_K_S: -10,390,528

The 1.7B selector was materialized as one GGUF:

file size: 961,694,976 bytes
payload bytes: 955,742,208
payload bpw: 3.763246
tensor mismatches: 0
local llama-bench: 37.6608 prompt tok/s, 10.5323 decode tok/s

Decision: PMRA is live as a production-shaped method. Cross-corpus TinyStories eval is recorded in PMRA-019.

Source:

results/run_008/stage13_c2_public_calibrated_pmra_result.md

PMRA-019 - Public-Calibrated Cross-Corpus TinyStories Eval

The public-calibrated PMRA selections survived a distribution shift to roneneldan/TinyStories validation.

Qwen3-1.7B, c2_calib_greedy_mixed:

NLL improvement vs IQ3_XS: 0.086969
NLL improvement vs Q3_K_S: 0.239027
NLL improvement vs same-budget random: 0.180680
payload bytes vs IQ3_XS: -6,232,064
payload bytes vs Q3_K_S: -39,262,208

Qwen3-0.6B-Base, c2_calib_weight_blend_mixed:

NLL improvement vs IQ3_XS: 0.140297
NLL improvement vs Q3_K_S: 0.080590
NLL improvement vs same-budget random: 0.359056
payload bytes vs IQ3_XS: -68,608
payload bytes vs Q3_K_S: -10,390,528

Decision: Gate 6C passes. PMRA can move into release packaging, with claims scoped to public-calibrated Qwen3 evidence and no cross-family benchmark claim.

Source:

results/run_008/stage14_c2_public_calibrated_cross_corpus_tinystories.md

PMRA-020 - Public-Calibrated LAMBADA Eval

The public-calibrated PMRA selections also passed EleutherAI/lambada_openai English test.

Qwen3-1.7B, c2_calib_greedy_mixed:

NLL improvement vs IQ3_XS: 0.110495
NLL improvement vs Q3_K_S: 0.136635
NLL improvement vs same-budget random: 0.183961
payload bytes vs IQ3_XS: -6,232,064
payload bytes vs Q3_K_S: -39,262,208

Qwen3-0.6B-Base, c2_calib_weight_blend_mixed:

NLL improvement vs IQ3_XS: 0.177553
NLL improvement vs Q3_K_S: 0.091113
NLL improvement vs same-budget random: 0.352031
payload bytes vs IQ3_XS: -68,608
payload bytes vs Q3_K_S: -10,390,528

Decision: the initial broader public benchmark mix passes. PMRA is ready for method/model release with scoped claims.

Source:

results/run_008/stage15_c2_public_calibrated_lambada_result.md

PMRA-021 - Gemma Knapsack Selector Artifact

The Gemma 4 E2B-it public-calibrated run was rerun with c2_calib_knapsack_mixed, a multiple-choice knapsack selector over the same candidate tensor/source promotions used by the greedy selector.

Wikitext-2 raw train -> validation, seed 7:

prompt audit overlap count: 0
knapsack PMRA NLL: 12.878809
greedy PMRA NLL: 13.281400
Q3_K_S target NLL: 17.993582
same-budget random NLL: 20.488594
NLL improvement vs Q3_K_S: 5.114774
NLL improvement vs same-budget random: 7.609785
NLL improvement vs greedy PMRA: 0.402591
selector-reported payload bytes: 3,094,396,044
selector-reported payload bytes vs Q3_K_S: 0
materialized artifact payload bytes: 3,094,397,068
materialized artifact payload bytes vs Q3_K_S: +1,024

The knapsack selection was materialized as one GGUF:

file size: 3,110,215,968 bytes
payload bpw: 5.326615
file bpw: 5.353845
GGUF SHA-256: a5a80f2628e236a228f2016bcc3ac660a268f2c8757d21d901095c74b60e3d97
tensor reload mismatches: 0
llama.cpp smoke prompt/generation speed: 30.5 / 10.6 tok/s

Source:

results/gemma4_e2b_it/selector_result_knapsack.md
results/gemma4_e2b_it/artifact_report_knapsack.md
results/gemma4_e2b_it/llama_cli_smoke_knapsack.log

PMRA-022 - Huihui Qwen3.5 Abliterated Weight-Blend Release

The Huihui Qwen3.5 4B abliterated PMRA release used the qwen35 tensor profile and layer-family allocation over the hybrid Qwen3.5 text stack. The selected artifact was c2_calib_weight_blend_mixed, not the knapsack candidate, because weight blend had the best Wikitext validation NLL in the selector result while remaining slightly below the IQ3_XS target payload budget.

Wikitext-2 raw train -> validation, seed 7:

released artifact: huihui_qwen35_4b_abliterated_pmra_calib_weight_blend.gguf
HF release repo: https://huggingface.co/Asystemoffields/Huihui-Qwen3.5-4B-Abliterated-PMRA-GGUF
group mode: layer_family
selector: c2_calib_weight_blend_mixed
low source: IQ2_M
target/control: IQ3_XS
stronger sources: Q3_K_S, Q3_K_M, Q3_K_L, IQ4_XS, Q4_K_M
fp16 reference NLL: 3.171504
IQ2_M NLL: 14.179427
IQ3_XS target NLL: 14.073741
Q3_K_S NLL: 13.977966
greedy PMRA NLL: 13.475620
knapsack PMRA NLL: 13.530774
weight-blend PMRA NLL: 13.471562
same-budget random NLL: 13.995436
NLL improvement vs IQ3_XS: 0.602179
NLL improvement vs Q3_K_S: 0.506404
NLL improvement vs same-budget random: 0.523874
payload bytes: 1,999,682,304
payload bytes vs IQ3_XS: -83,200
payload bytes vs Q3_K_S: -59,229,440

The weight-blend selection was materialized as one GGUF:

file size: 2,010,651,904 bytes
payload bpw: 3.803710
file bpw: 3.824576
GGUF SHA-256: 0d7fff15074b8146c37ce3d74adb7d377bb6c686b543840da468c1b683baeb03
tensor reload mismatches: 0

Source:

docs/QWEN35_ABLITERATED_PMRA.md
docs/HUIHUI_QWEN35_4B_ABLITERATED_HF_MODEL_CARD.md
https://huggingface.co/Asystemoffields/Huihui-Qwen3.5-4B-Abliterated-PMRA-GGUF

PMRA-023 - Ministral 3 8B Instruct Knapsack Release Candidate

Ministral 3 8B Instruct was completed with the mistral3 tensor profile and tensor-level c2_calib_knapsack_mixed allocation. The 48/512 selector shape was attempted but projected beyond the Modal job window, so this record uses the completed 12/128 scout selector and a separate 512-prompt held-out Wikitext test confirmation.

Wikitext-2 raw train -> validation, seed 7:

primary artifact: ministral3_8b_pmra_knapsack_iq3xs_budget.gguf
compact artifact: ministral3_8b_pmra_knapsack_3p2.gguf
group mode: tensor
low source: IQ2_M
target/control: IQ3_XS
stronger sources: Q2_K, Q2_K_L, Q3_K_S, Q3_K_M, IQ4_XS
prompt audit overlap count: 0
IQ2_M NLL: 4.874847
IQ3_XS target NLL: 4.649152
Q3_K_S NLL: 4.686507
PMRA knapsack NLL: 4.456880
PMRA 3.2 bpw NLL: 4.510145
same-budget random NLL: 4.825388
PMRA NLL improvement vs IQ3_XS: 0.192272
PMRA NLL improvement vs Q3_K_S: 0.229628
PMRA NLL improvement vs same-budget random: 0.368508
PMRA payload bytes vs IQ3_XS: -557,056
compact PMRA payload bytes vs IQ3_XS: -310,935,552

Held-out Wikitext-2 raw test, 512 prompts:

public eval decision: GO
IQ2_M NLL: 4.963936
IQ3_XS target NLL: 4.722369
Q3_K_S NLL: 4.757542
PMRA knapsack NLL: 4.537475
PMRA 3.2 bpw NLL: 4.600533
same-budget random NLL: 4.912780
NLL improvement vs IQ3_XS: 0.184894
NLL improvement vs Q3_K_S: 0.220067
NLL improvement vs same-budget random: 0.375305
payload bytes vs IQ3_XS: -557,056
payload bytes vs Q3_K_S: -152,666,112
compact NLL improvement vs IQ3_XS: 0.121836
compact payload bytes vs IQ3_XS: -310,935,552

The primary and compact selections were materialized as loadable GGUFs:

primary file size: 3,713,801,312 bytes
primary payload bpw: 3.492210
primary file bpw: 3.499643
primary GGUF SHA-256: 7f88294593cf419a5b39b4da2c7df356fee9528de947d6547b9d11d60a84ac5d
compact file size: 3,403,422,816 bytes
compact payload bpw: 3.199730
compact file bpw: 3.207163
compact GGUF SHA-256: ff95384e68f211b238767e1783d20ce0b4a8be8a56ac8b906756c481831421a3
tensor reload mismatches: 0 for both artifacts

Source:

docs/MINISTRAL3_8B_INSTRUCT_PMRA.md
docs/MINISTRAL3_8B_INSTRUCT_HF_MODEL_CARD.md
tmp/ministral3_8b_release/public_eval_wikitext_test_result.md
tmp/ministral3_8b_release/artifact_iq3xs_budget_report.md
tmp/ministral3_8b_release/artifact_3p2_report.md

PMRA-024 - Granite 4.1 8B Heretic Knapsack Release

Granite 4.1 8B Heretic was completed with a new granite tensor profile and layer-family c2_calib_knapsack_mixed allocation. The release docs and HF model card credit heretic-org as the upstream Heretic checkpoint provider.

Wikitext-2 raw train -> validation, seed 7:

artifact: granite4_1_8b_heretic_pmra_layer_family_iq3xs_budget.gguf
HF release repo: https://huggingface.co/Asystemoffields/IBM-granite-4.1-8b-heretic-PMRA-GGUF
base model: heretic-org/IBM-granite-4.1-8b-heretic
GGUF source repo: mradermacher/IBM-granite-4.1-8b-heretic-i1-GGUF
group mode: layer_family
low source: IQ2_M
target/control: IQ3_XS
stronger sources: Q2_K_S, Q2_K, Q3_K_S, Q3_K_M, IQ4_XS
prompt audit overlap count: 0
IQ2_M NLL: 5.028465
IQ3_XS target NLL: 4.845994
Q2_K NLL: 4.707305
Q3_K_S NLL: 4.823249
IQ4_XS NLL: 4.579525
PMRA knapsack NLL: 4.469497
same-budget random NLL: 4.840297
PMRA NLL improvement vs IQ3_XS: 0.376498
PMRA NLL improvement vs Q3_K_S: 0.353752
PMRA NLL improvement vs same-budget random: 0.370800
PMRA payload bytes vs IQ3_XS: -1,392,640
PMRA payload bytes vs Q3_K_S: -165,888,000

Held-out Wikitext-2 raw test, 512 prompts:

public eval decision: GO
IQ2_M NLL: 5.150425
IQ3_XS target NLL: 4.960251
Q2_K NLL: 4.754195
Q3_K_S NLL: 4.933018
IQ4_XS NLL: 4.672932
PMRA knapsack NLL: 4.539084
same-budget random NLL: 4.939853
NLL improvement vs IQ3_XS: 0.421167
NLL improvement vs Q3_K_S: 0.393934
NLL improvement vs same-budget random: 0.400769
payload bytes vs IQ3_XS: -1,392,640
payload bytes vs Q3_K_S: -165,888,000

The selected tensor payloads were materialized as one loadable GGUF:

file size: 3,600,448,224 bytes
payload bytes: 3,596,877,824
payload bpw: 3.433548
file bpw: 3.436956
GGUF SHA-256: 29d3d2b33583127789ee26b0b5e1d7204cb5330af2c265bef6b42c7a4a4a291a
tensor reload mismatches: 0

Source:

docs/GRANITE4_1_8B_HERETIC_PMRA.md
docs/GRANITE4_1_8B_HERETIC_HF_MODEL_CARD.md
tmp/granite4_1_8b_heretic/release/public_eval_wikitext_test_result.md
tmp/granite4_1_8b_heretic/release/artifact_report.md

PMRA-008 - Replication

Partial clean replication on Qwen3-0.6B.

With prompt_audit.overlap_count = 0 on seeds 6, 7, and 8, the weight-MSE mixed selector beat IQ3_M, IQ4_XS, and random same-budget allocation while staying slightly below IQ3_M payload bytes.

mean NLL improvement vs IQ3_M: 0.249980
mean NLL improvement vs IQ4_XS: 0.057421
mean NLL improvement vs random same-budget: 0.244507
mean NLL delta vs Q3_K_M: -0.127205
payload bytes vs IQ3_M: -65,536

Source:

codex_ladder/runs/run_008/stage5_c2_qwen06_replication_clean_result.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evidence - Production Mixed-Rate Allocation

Leak-Audit Status

Confirmed Findings

PMRA-001 - Three-Seed 64-Prompt Model-Forward Marker

PMRA-002 - Real Mixed GGUF Artifact

PMRA-003 - llama.cpp Runtime Load

PMRA-004 - 256-Prompt Robustness Held

PMRA-005 - Weight-MSE Mixed Artifact

Killed Adjacent Branches

C1 Production-Base Residual Overlay

C5 Simple IQ4 Semantic Erasure

Open Evidence

PMRA-006 - IQ3_M-Budget Compression Gate

PMRA-007 - IQ3_M-Budget Artifact And Runtime

Open Evidence

PMRA-009 - Clean 1.7B Leak-Audit Rerun

PMRA-010 - Clean Artifact And Runtime

PMRA-011 - Q3_K_S-Target Stronger-Control Gate

PMRA-012 - Sub-Q3 IQ2_M to IQ3_XS Gate

PMRA-013 - Sub-Q3 Artifact And Runtime

PMRA-014 - Sub-Q3 1024-Prompt Confirmation

PMRA-015 - Sub-Q3 Calib48 Production-Shaped Pass

PMRA-016 - Sub-Q3 Calib48 Second-Size Replication

PMRA-017 - Frozen Selector Public Wikitext Eval

PMRA-018 - Public-Calibrated PMRA Release Candidate

PMRA-019 - Public-Calibrated Cross-Corpus TinyStories Eval

PMRA-020 - Public-Calibrated LAMBADA Eval

PMRA-021 - Gemma Knapsack Selector Artifact

PMRA-022 - Huihui Qwen3.5 Abliterated Weight-Blend Release

PMRA-023 - Ministral 3 8B Instruct Knapsack Release Candidate

PMRA-024 - Granite 4.1 8B Heretic Knapsack Release

PMRA-008 - Replication

FilesExpand file tree

EVIDENCE.md

Latest commit

History

EVIDENCE.md

File metadata and controls

Evidence - Production Mixed-Rate Allocation

Leak-Audit Status

Confirmed Findings

PMRA-001 - Three-Seed 64-Prompt Model-Forward Marker

PMRA-002 - Real Mixed GGUF Artifact

PMRA-003 - llama.cpp Runtime Load

PMRA-004 - 256-Prompt Robustness Held

PMRA-005 - Weight-MSE Mixed Artifact

Killed Adjacent Branches

C1 Production-Base Residual Overlay

C5 Simple IQ4 Semantic Erasure

Open Evidence

PMRA-006 - IQ3_M-Budget Compression Gate

PMRA-007 - IQ3_M-Budget Artifact And Runtime

Open Evidence

PMRA-009 - Clean 1.7B Leak-Audit Rerun

PMRA-010 - Clean Artifact And Runtime

PMRA-011 - Q3_K_S-Target Stronger-Control Gate

PMRA-012 - Sub-Q3 IQ2_M to IQ3_XS Gate

PMRA-013 - Sub-Q3 Artifact And Runtime

PMRA-014 - Sub-Q3 1024-Prompt Confirmation

PMRA-015 - Sub-Q3 Calib48 Production-Shaped Pass

PMRA-016 - Sub-Q3 Calib48 Second-Size Replication

PMRA-017 - Frozen Selector Public Wikitext Eval

PMRA-018 - Public-Calibrated PMRA Release Candidate

PMRA-019 - Public-Calibrated Cross-Corpus TinyStories Eval

PMRA-020 - Public-Calibrated LAMBADA Eval

PMRA-021 - Gemma Knapsack Selector Artifact

PMRA-022 - Huihui Qwen3.5 Abliterated Weight-Blend Release

PMRA-023 - Ministral 3 8B Instruct Knapsack Release Candidate

PMRA-024 - Granite 4.1 8B Heretic Knapsack Release

PMRA-008 - Replication