Skip to content

Add --profile flag for LLM benchmarks#848

Open
odjuricicTT wants to merge 1 commit intomainfrom
odjuricic/device-profile
Open

Add --profile flag for LLM benchmarks#848
odjuricicTT wants to merge 1 commit intomainfrom
odjuricic/device-profile

Conversation

@odjuricicTT
Copy link
Collaborator

Problem description

When profiling LLM benchmarks with tools like Tracy, running the full benchmark with all layers and many iterations is slow and generates excessive data. A streamlined profiling mode is needed for faster iteration during performance analysis.

What's changed

Added a --profile pytest flag to enable profiling mode for LLM benchmarks:

  • conftest.py: Added --profile command-line option and fixture
  • test_llms.py: All test functions now accept and pass the profile fixture. When enabled, models are loaded with num_layers=1
  • llm_benchmark.py:
    • Added tracy signposts for token generation and warmup phases
    • When --profile is passed, limits max_tokens_to_generate to 2 (affects both warmup and benchmark)
    • Skips the 10-iteration validation in profile mode

Usage:

pytest -svv benchmark/tt-xla/test_llms.py::test_llama_3_2_1b --profile

Checklist

  • New/Existing tests provide coverage for changes

Note: This PR depends on the num_layers PR in tt_forge_models and should be merged after that PR lands.

@odjuricicTT odjuricicTT force-pushed the odjuricic/device-profile branch 4 times, most recently from c6e3a41 to b4f9196 Compare February 6, 2026 12:32
@odjuricicTT odjuricicTT force-pushed the odjuricic/device-profile branch from b4f9196 to 41033c9 Compare February 6, 2026 13:23
@odjuricicTT
Copy link
Collaborator Author

Performance benchmark run (llama_3_2_1b only): https://github.com/tenstorrent/tt-forge/actions/runs/21752281045

@odjuricicTT
Copy link
Collaborator Author

Tracy is not on XLA main yet, will not merge this

@vkovacevicTT
Copy link
Contributor

vkovacevicTT commented Feb 6, 2026

Tracy is not on XLA main yet, will not merge this

Let's do this in tt-xla then when we migrate the benchmark: tenstorrent/tt-xla#2971

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants