Add --profile flag for LLM benchmarks by odjuricicTT · Pull Request #848 · tenstorrent/tt-forge

odjuricicTT · 2026-02-03T16:19:14Z

Problem description

When profiling LLM benchmarks with tools like Tracy, running the full benchmark with all layers and many iterations is slow and generates excessive data. A streamlined profiling mode is needed for faster iteration during performance analysis.

What's changed

Added a --profile pytest flag to enable profiling mode for LLM benchmarks:

conftest.py: Added --profile command-line option and fixture
test_llms.py: All test functions now accept and pass the profile fixture. When enabled, models are loaded with num_layers=1
llm_benchmark.py:
- Added tracy signposts for token generation and warmup phases
- When --profile is passed, limits max_tokens_to_generate to 2 (affects both warmup and benchmark)
- Skips the 10-iteration validation in profile mode

Usage:

pytest -svv benchmark/tt-xla/test_llms.py::test_llama_3_2_1b --profile

Checklist

New/Existing tests provide coverage for changes

Note: This PR depends on the num_layers PR in tt_forge_models and should be merged after that PR lands.

benchmark/tt-xla/llm_benchmark.py

odjuricicTT · 2026-02-06T13:30:03Z

Performance benchmark run (llama_3_2_1b only): https://github.com/tenstorrent/tt-forge/actions/runs/21752281045

odjuricicTT · 2026-02-06T13:45:14Z

Tracy is not on XLA main yet, will not merge this

vkovacevicTT · 2026-02-06T15:45:17Z

Tracy is not on XLA main yet, will not merge this

Let's do this in tt-xla then when we migrate the benchmark: tenstorrent/tt-xla#2971

odjuricicTT requested review from rpavlovicTT, tt-mpantic and vkovacevicTT as code owners February 3, 2026 16:19

github-code-quality bot found potential problems Feb 3, 2026

View reviewed changes

benchmark/tt-xla/llm_benchmark.py Fixed Show fixed Hide fixed

odjuricicTT force-pushed the odjuricic/device-profile branch 4 times, most recently from c6e3a41 to b4f9196 Compare February 6, 2026 12:32

vkovacevicTT approved these changes Feb 6, 2026

View reviewed changes

Add profile option to benchmarks

41033c9

odjuricicTT force-pushed the odjuricic/device-profile branch from b4f9196 to 41033c9 Compare February 6, 2026 13:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --profile flag for LLM benchmarks#848

Add --profile flag for LLM benchmarks#848
odjuricicTT wants to merge 1 commit intomainfrom
odjuricic/device-profile

odjuricicTT commented Feb 3, 2026

Uh oh!

Uh oh!

odjuricicTT commented Feb 6, 2026

Uh oh!

odjuricicTT commented Feb 6, 2026

Uh oh!

vkovacevicTT commented Feb 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

odjuricicTT commented Feb 3, 2026

Problem description

What's changed

Checklist

Uh oh!

Uh oh!

odjuricicTT commented Feb 6, 2026

Uh oh!

odjuricicTT commented Feb 6, 2026

Uh oh!

vkovacevicTT commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vkovacevicTT commented Feb 6, 2026 •

edited

Loading