feat: Enable benchmark-mode module inventory/export across all CausalLM architectures by vbaddi · Pull Request #906 · quic/efficient-transformers

vbaddi · 2026-04-03T10:17:26Z

WIP: This PR extends enable_benchmark=True support in QEFFAutoModelForCausalLM to all CausalLM models.

What changed

Added architecture coverage for CausalLM families (lama, gpt_oss including gpt2, codegen, falcon,
gptj, mistral, mixtral, mpt, phi, phi3, qwen2, starcoder2, granite, olmo2).
Added mixtral MoE module benchmark support (attention, decoder, moe).
Added seq_len passthrough to get_benchmark_module_specs(...).
Added tiny-model benchmark inventory test matrix to validate module dump behavior across all listed tiny
CausalLM models.
Kept benchmark behavior gated by enable_benchmark=True; non-benchmark flow remains backward compatible.

Example benchmark output (Llama)

Mode	Module	Type	Prefill ms	Decode ms
Prefill/Decode	Attention	Attention	0.7051	0.5613
Prefill/Decode	Decoder	Decoder	0.7975	0.6924

Input/Output shape section in report

  - attention inputs: {"attention_mask":[1,1,32,128],"hidden_states":[1,32,16],"past_key.0":
    [1,4,128,4],"past_value.0":[1,4,128,4],"position_ids":[1,32]}
  - attention outputs: {"attention_output":[1,32,16],"past_key_RetainedState":
    [1,4,128,4],"past_value_RetainedState":[1,4,128,4]}

Validation

   python -m pytest -q tests/unit_test/benchmarking/test_causal_lm_microbenchmark.py (25 passed)
   python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -n auto (62 passed)

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

anujgupt-github · 2026-04-06T09:38:53Z

@vbaddi - can we restructure this as below?
We really need only benchmarks for Attention and FFN (incl expert interactions for MOE models)

We can create Attention and MOE/FFN benchmarks, and use onnx symbols to set fields? Some fields can come from config.json of model card, like dm/dh?

Maybe I didn't fully understand the table you gave above

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

vbaddi · 2026-04-09T04:52:28Z

@vbaddi - can we restructure this as below? We really need only benchmarks for Attention and FFN (incl expert interactions for MOE models)

We can create Attention and MOE/FFN benchmarks, and use onnx symbols to set fields? Some fields can come from config.json of model card, like dm/dh?

Maybe I didn't fully understand the table you gave above

Thanks @anujgupt-github. These are all configurable from the config or model card passed, whatever needs to be edited, can either be changed in config or pass that args in .from_pretrained()

The table is basically a dummy inputs running on QAic and providing the numbers for those modules. (/sess.run())

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

vbaddi added 2 commits April 3, 2026 09:41

feat: Add MicroBenchmark module export/compile/generate to the QEff

cfe66af

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: enable for all causal-lm models

115da08

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

vbaddi added the enhancement New feature or request label Apr 3, 2026

vbaddi marked this pull request as draft April 3, 2026 10:17

nit: add license file to the example

549716c

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

vbaddi changed the title ~~(feat): Enable benchmark-mode module inventory/export across all CausalLM architectures~~ feat: Enable benchmark-mode module inventory/export across all CausalLM architectures Apr 3, 2026

vbaddi marked this pull request as ready for review April 4, 2026 14:38

nit: Add benchmark specific changes to fix gptoss; 0407

01d9a5e

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

nit: minor fixes; 0409

03ae460

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Enable benchmark-mode module inventory/export across all CausalLM architectures#906

feat: Enable benchmark-mode module inventory/export across all CausalLM architectures#906
vbaddi wants to merge 5 commits intoquic:mainfrom
vbaddi:feat/enable_micro_benchmark

vbaddi commented Apr 3, 2026 •

edited

Loading

Uh oh!

anujgupt-github commented Apr 6, 2026

Uh oh!

vbaddi commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vbaddi commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Example benchmark output (Llama)

Validation

Uh oh!

anujgupt-github commented Apr 6, 2026

Uh oh!

vbaddi commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vbaddi commented Apr 3, 2026 •

edited

Loading