[quantization] Add a gemma wrapper for TextModel#791
Conversation
84fe0d5 to
f177e87
Compare
Note for myselfWhatThis commit adds a full PTQ wrapper ( WhyThe Gemma4 model family requires a text-model-level wrapper to enable end-to-end PTQ quantization and conversion to Circle format. Previously, only the sub-component wrappers (attention, MLP, decoder layer) were registered, meaning the full text model could not be prepared as a single quantizable unit. This commit fills that gap by providing the top-level
Key Design Decisions
Changes
TestsThe changes are tested through two complementary mechanisms:
|
| matches = (inputs_embeds[:, :, None, :] == weight[None, None, :, :]).all( | ||
| dim=-1 | ||
| ) |
There was a problem hiding this comment.
I suspect that the exact equality comparison inputs_embeds[...] == weight[...] may fail:
- In QUANT mode,
inputs_embedsis fake-quantized before the reverse lookup is called, so it no longer exactly matches any row in the raw weight table. - With dtype casting, floating-point non-associativity breaks the comparison:
(weight * scale).to(fp16) != weight.to(fp16) * scale.to(fp16), because the multiplication happens in different precision contexts on each side.
There was a problem hiding this comment.
Good catch. The reverse lookup is inherently a floating-point exact-match operation against the raw embedding table as you said, so it does not fit well with the quantization path.
I changed the QUANT path so that, when PLE is enabled and inputs_embeds is provided, callers must also provide explicit per_layer_inputs. This prevents the reverse lookup from running on fake-quantized embeddings.
|
🤔 Gemma4TextModelCase |
This commit adds a wrapper for gemma text model. TICO-DCO-1.0-Signed-off-by: seongwoo <mhs4670go@naver.com>
|
The export failure is expected because To make this limitation clear, I explicitly marked Circle export as unsupported for this case so that an export request fails early with a descriptive message, rather than surfacing as an unexpected conversion error. python -m tico.quantization.examples.inspect \
--config tico/quantization/examples/configs/wrapper_smoke.yaml \
--mode wrapper-smoke \
--case gemma4_text_model \
--export circle \
--output-dir ./out/wrapper_smoke
┌───────────── Wrapper Smoke Summary ─────────────
│ Case : gemma4_text_model
│ Status : FAIL
│ Mean |diff| : 0.079053
│ Max |diff| : 0.483883
│ PEIR : 0.083751
│ Shape match : True
│ Quant finite : True
└─────────────────────────────────────────────────
Messages:
- This case validates PTQ numerical parity only. Full Gemma4TextModel Circle export requires a dedicated static adapter.
┌────────────────────────────────────────────┐
3.7┤ │
│ │
│ • • │
2.6┤ │
│ • │
│ • •• │
│ •••• │
1.6┤ ••• │
│ ••••• │
│ ••••• │
0.5┤ ••••• │
│ ••••• │
│ •••••• │
│ ••••• │
-0.5┤ •••••• │
│ ••••• │
│ •••• │
-1.6┤ •••• │
│ •••••• │
│ ••• │
│ ••• │
-2.7┤ │
└┬──────────┬──────────┬─────────┬──────────┬┘
-2.7 -1.1 0.5 2.1 3.7 |
This commit adds a wrapper for gemma text model.
TICO-DCO-1.0-Signed-off-by: seongwoo mhs4670go@naver.com