Skip to content

feat: Enable benchmark-mode module inventory/export across all CausalLM architectures#906

Open
vbaddi wants to merge 5 commits intoquic:mainfrom
vbaddi:feat/enable_micro_benchmark
Open

feat: Enable benchmark-mode module inventory/export across all CausalLM architectures#906
vbaddi wants to merge 5 commits intoquic:mainfrom
vbaddi:feat/enable_micro_benchmark

Conversation

@vbaddi
Copy link
Copy Markdown
Contributor

@vbaddi vbaddi commented Apr 3, 2026

WIP: This PR extends enable_benchmark=True support in QEFFAutoModelForCausalLM to all CausalLM models.

What changed

  • Added architecture coverage for CausalLM families (lama, gpt_oss including gpt2, codegen, falcon,
    gptj, mistral, mixtral, mpt, phi, phi3, qwen2, starcoder2, granite, olmo2).
  • Added mixtral MoE module benchmark support (attention, decoder, moe).
  • Added seq_len passthrough to get_benchmark_module_specs(...).
  • Added tiny-model benchmark inventory test matrix to validate module dump behavior across all listed tiny
    CausalLM models.
  • Kept benchmark behavior gated by enable_benchmark=True; non-benchmark flow remains backward compatible.

Example benchmark output (Llama)

Mode Module Type Prefill ms Decode ms
Prefill/Decode Attention Attention 0.7051 0.5613
Prefill/Decode Decoder Decoder 0.7975 0.6924

Input/Output shape section in report

  - attention inputs: {"attention_mask":[1,1,32,128],"hidden_states":[1,32,16],"past_key.0":
    [1,4,128,4],"past_value.0":[1,4,128,4],"position_ids":[1,32]}
  - attention outputs: {"attention_output":[1,32,16],"past_key_RetainedState":
    [1,4,128,4],"past_value_RetainedState":[1,4,128,4]}

Validation

   python -m pytest -q tests/unit_test/benchmarking/test_causal_lm_microbenchmark.py (25 passed)
   python -m pytest -q tests/unit_test/models/test_model_quickcheck.py -n auto (62 passed)

vbaddi added 2 commits April 3, 2026 09:41
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
@vbaddi vbaddi added the enhancement New feature or request label Apr 3, 2026
@vbaddi vbaddi marked this pull request as draft April 3, 2026 10:17
Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
@vbaddi vbaddi changed the title (feat): Enable benchmark-mode module inventory/export across all CausalLM architectures feat: Enable benchmark-mode module inventory/export across all CausalLM architectures Apr 3, 2026
@vbaddi vbaddi marked this pull request as ready for review April 4, 2026 14:38
@anujgupt-github
Copy link
Copy Markdown
Contributor

@vbaddi - can we restructure this as below?
We really need only benchmarks for Attention and FFN (incl expert interactions for MOE models)

We can create Attention and MOE/FFN benchmarks, and use onnx symbols to set fields? Some fields can come from config.json of model card, like dm/dh?

Maybe I didn't fully understand the table you gave above

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
@vbaddi
Copy link
Copy Markdown
Contributor Author

vbaddi commented Apr 9, 2026

@vbaddi - can we restructure this as below? We really need only benchmarks for Attention and FFN (incl expert interactions for MOE models)

We can create Attention and MOE/FFN benchmarks, and use onnx symbols to set fields? Some fields can come from config.json of model card, like dm/dh?

Maybe I didn't fully understand the table you gave above

Thanks @anujgupt-github. These are all configurable from the config or model card passed, whatever needs to be edited, can either be changed in config or pass that args in .from_pretrained()

The table is basically a dummy inputs running on QAic and providing the numbers for those modules. (/sess.run())

Signed-off-by: vbaddi <vbaddi@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants