Skip to content

Added performance analysis as a feature with AutoModelForCausalLM#888

Draft
ochougul wants to merge 1 commit intomainfrom
get_perf
Draft

Added performance analysis as a feature with AutoModelForCausalLM#888
ochougul wants to merge 1 commit intomainfrom
get_perf

Conversation

@ochougul
Copy link
Copy Markdown
Contributor

Summary

  • Added evaluate_performance(...) to QEFFAutoModelForCausalLM for end-to-end performance analysis: compile + qaic-runner + qaic-opstats.
  • Compile perf flags are always enabled: aic_perf_metrics=True, aic_perf_warning=True; for raw_device_stats, also force stats_level=70, ddr_stats=True, aic_pmu_recipe="KernelUtil".
  • Added prefill_only to evaluate_performance(...) and now forward it to compile(...).

Key Behavior Changes

Stage selection is now:

  • prefill_only=True -> prefill-only
  • prefill_seq_len==1 -> decode-only
  • otherwise -> both prefill + decode

Artifacts and Paths

  • Standardized output layout: compile/, io/, performance_analysis/.
  • Added per-stage subdirs for prefill/decode under io, profiling, runner_outputs, and opstats.

Validation

  • Expanded tests in tests/unit_test/utils/test_auto_model_api.py; status: 70 passed.
  • Hardware smoke verified both:
  • --prefill-only -> only prefill artifacts
  • --prompt-len 1 (without --prefill-only) -> only decode artifacts

@ochougul ochougul self-assigned this Mar 26, 2026
@vbaddi vbaddi marked this pull request as draft April 2, 2026 06:03
…ausalLM class

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

removed redundancies

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

added prefill-only flag and prefill/decode run by default

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

ran linter formatter

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

added license

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant