-
Notifications
You must be signed in to change notification settings - Fork 19
Add automated performance bisect tools for tt-xla and tt-mlir #2303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Some non-blocking comments:
|
|
||||||||||||||
| BINARY_DIR ${TTMLIR_BUILD_DIR} | ||
|
|
||
| BUILD_COMMAND env ${WITH_METAL_RUNTIME_ROOT_SET} ${CMAKE_COMMAND} --build <BINARY_DIR> | ||
| - COMMAND ${CMAKE_COMMAND} --build <BINARY_DIR> --target tt-alchemist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@svuckovicTT Can this patch just be part of the changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Synced offline, it's okay, will remove soon as there will be no need for it.
Disclaimer - the script was born out of necessity for me to automate time consuming manual work. If other teammates start using it, we can make it better and integrate into Claude directly (via plugins).
Btw thanks for sharing your repo, I wasn't aware of it. If you think some of those scripts are ready for use for other developers who work on uplifting, please share! |
|
@jameszianxuTT and @brataTT, FYI since you already had some bisect scripts |
0e40410 to
a715f90
Compare
brataTT
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff. Learned some things from it.
Wasn't aware we could use git bisect view that way.
In uplift scripts, we use it to get a status/summary of long running bisects instead.
Separating logs by commits instead of commands is also a good idea.
I'll check if I can steal these ideas to improve metal uplift scripts too.
Concerns
- I don't see any clean-build step.
That'll make running this really fast 👍 .
But if there are unreliable results at any point, it may be useful to add that step.
With that said, I have no specific examples of when this can happen or if this is a valid concern. - Patches: If mlir introduces breaking changes that required changes in tt-xla uplift PR, there will be a larger number of untestable changes.
In metal uplift, we solve this by keeping separate downstream fix patches per breaking upstream commit.
Then, we apply those patches starting from the breaking change onwards.
The unrolling @jameszianxuTT mentioned simplifies this.
| # Reset bisect before checking files | ||
| git bisect reset | ||
|
|
||
| # Check if this is a tt-mlir uplift commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be some AI slop from claude in this part.
The only reliable source of truth for detecting an uplift commit here should be the second case (checking TT_MLIR_VERSION), and even then, changes around that line will also falsely detect an uplift commit since git diff is used.
Better approach would be to extract the mlir version from BAD and BAD^ individually, then compare those strings for equality upto the length of shortest string (to compare between short SHA and long SHA).
Case 1 assumes people use the templated uplift PR (not always true). Case 3 is not how we use tt-mlir, but forge-model repo, so it tries to detect tt-forge-models uplift as tt-mlir uplift. Neither is a huge deal due to the check you added after for if good_mlir = bad_mlir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I'll rewrite this check to explicitly compare tt-mlir SHAs.
About your concerns --
Anyway as I said, this is a demo version I made quickly with Claude just to help us out. |
a715f90 to
2060e97
Compare
Agreed, this should not block your PR, more of an improvement. |
This commit introduces three new scripts for automating performance regression investigation through git bisect: 1. **bisect_perf.sh** - Standalone bisect script for tt-xla commits - Tests individual tt-xla commits for performance regressions - Supports custom benchmark commands, thresholds, and metric patterns - Handles build failures and benchmark crashes gracefully (exit 125) - Can be used directly with: git bisect run ./scripts/bisect_perf.sh 2. **bisect_ttmlir_perf.sh** - Standalone bisect script for tt-mlir commits - Tests individual tt-mlir commits within the tt-mlir submodule - Dynamically modifies CMakeLists.txt to test specific tt-mlir versions - Supports incremental builds for faster iteration - Logs all tests to /tmp/bisect_ttmlir_*.log for later analysis 3. **bisect_perf_auto.sh** - Master orchestrator (RECOMMENDED) - Fully automated two-phase bisect: tt-xla first, then tt-mlir if needed - Automatically detects tt-mlir uplift commits - Drills down into tt-mlir submodule to find exact regression commit - Comprehensive logging and final summary report - Single command operation: ./scripts/bisect_perf_auto.sh -g <good> -b <bad> 4. **cmake_fix.patch** - CMakeLists.txt patch for incremental builds - Removes --target tt-alchemist to enable faster incremental builds - Applied automatically by bisect scripts All scripts support: - Custom benchmark commands (-c, --command) - Custom performance thresholds (-t, --threshold) - Custom metric extraction patterns (-p, --pattern) - Comprehensive help documentation (-h, --help) - No hardcoded paths - works from any tt-xla location Default configuration: - Benchmark: resnet model with batch size 8, bfloat16, 128 loops - Threshold: 680 samples/second - Pattern: "Sample per second:\s*\K[0-9.]+" Example usage: ./scripts/bisect_perf_auto.sh -g 051ebb2 -b HEAD Successfully tested and used to identify tt-mlir commit a868fa2a9 ([TTIRFusing] Improve ScaledSumToMeanPattern) as the root cause of a performance regression in resnet (from ~700 to ~574 samples/sec). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
2060e97 to
52c0fa2
Compare
|
Ping @AleksKnezevic @mrakitaTT @nvukobratTT as code owners to take a look |


This commit introduces three new scripts for automating performance regression investigation through git bisect:
bisect_perf.sh - Standalone bisect script for tt-xla commits
bisect_ttmlir_perf.sh - Standalone bisect script for tt-mlir commits
bisect_perf_auto.sh - Master orchestrator (RECOMMENDED)
cmake_fix.patch - CMakeLists.txt patch for incremental builds
All scripts support:
Default configuration:
Example usage:
./scripts/bisect_perf_auto.sh -g 051ebb2 -b HEAD
Successfully tested and used to identify tt-mlir commit a868fa2a9 ([TTIRFusing] Improve ScaledSumToMeanPattern) as the root cause of a performance regression in resnet (from ~700 to ~574 samples/sec).
🤖 Generated with Claude Code