Skip to content

Commit 6cf92ae

Browse files
Fix GatherBlockQuantized node to support symmetric quantized LM_HEAD (#1951)
Today models created with python -m onnxruntime_genai.models.builder -p int4 -e webgpu --extra_options shared_embeddings=true int4_algo_config=rtn_last int4_is_symmetric=true have invalid GatherBlockQuanntized nodes because the zero point attribute of the node points to a non-existent tensor lm_head.MatMul.weight_zp. This change fixes builder.py, so that we are selective about adding that attribute to the GatherBlockQuantized node.
1 parent 9c524ed commit 6cf92ae

File tree

1 file changed

+4
-1
lines changed
  • src/python/py/models/builders

1 file changed

+4
-1
lines changed

src/python/py/models/builders/base.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1324,9 +1324,12 @@ def make_embedding(self, embedding):
13241324
self.make_reshape(
13251325
weight_reshape_name, weight_reshape_inputs, dtype=ir.DataType.UINT8, shape=[self.vocab_size, flat_dim]
13261326
)
1327+
input_names = [weight_reshape_output, "input_ids", "lm_head.MatMul.weight_scale"];
1328+
if not self.quant_attrs["int4"]["is_symmetric"]:
1329+
input_names.append("lm_head.MatMul.weight_zp")
13271330
self.make_node(
13281331
"GatherBlockQuantized",
1329-
inputs=[weight_reshape_output, "input_ids", "lm_head.MatMul.weight_scale", "lm_head.MatMul.weight_zp"],
1332+
inputs=input_names,
13301333
outputs=[gather_output],
13311334
name=gather_name,
13321335
domain="com.microsoft",

0 commit comments

Comments
 (0)