Commit f84ac1c
authored
fix: Fix memory bandwidth calculation in MLA benchmarks (#2479)
<!-- .github/pull_request_template.md -->
## 📌 Description
Summary
* Fixed incorrect memory bandwidth calculation in
`testBatchMLAPagedAttentionWrapper` that was using full tensor
allocations instead of actual bytes accessed based on sequence lengths
* Updated `bench_trtllm_gen_mla.py` to use the unified
`bench_gpu_time()` utility with CUPTI for consistent timing with the
benchmark framework
cc @hypdeb
<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->
## 🔍 Related Issues
<!-- Link any related issues here -->
## 🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### ✅ Pre-commit Checks
- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## 🧪 Tests
- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).
## Reviewer Notes
<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Improved benchmarking: switched to CUDA/CUPTI-based timing with
refined iteration controls (dry/run and repeat by iterations) and
optional CUDA graph support.
* Updated performance reporting to use explicit memory accounting from
actual token usage (query, KV, output), and adjusted bandwidth and FLOPs
printouts for clearer, more accurate throughput metrics.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->1 parent 6ae5bfe commit f84ac1c
2 files changed
Lines changed: 43 additions & 24 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
86 | | - | |
| 86 | + | |
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
100 | | - | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
105 | | - | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
106 | 105 | | |
107 | 106 | | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
108 | 121 | | |
109 | 122 | | |
110 | 123 | | |
111 | 124 | | |
112 | | - | |
| 125 | + | |
113 | 126 | | |
114 | 127 | | |
115 | 128 | | |
116 | 129 | | |
117 | 130 | | |
118 | | - | |
119 | | - | |
120 | | - | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
121 | 134 | | |
122 | 135 | | |
123 | 136 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2244 | 2244 | | |
2245 | 2245 | | |
2246 | 2246 | | |
2247 | | - | |
2248 | | - | |
2249 | | - | |
2250 | | - | |
2251 | | - | |
| 2247 | + | |
| 2248 | + | |
| 2249 | + | |
| 2250 | + | |
| 2251 | + | |
2252 | 2252 | | |
2253 | | - | |
2254 | | - | |
2255 | | - | |
2256 | | - | |
2257 | | - | |
| 2253 | + | |
| 2254 | + | |
| 2255 | + | |
| 2256 | + | |
| 2257 | + | |
| 2258 | + | |
2258 | 2259 | | |
2259 | | - | |
2260 | | - | |
| 2260 | + | |
| 2261 | + | |
| 2262 | + | |
| 2263 | + | |
| 2264 | + | |
| 2265 | + | |
| 2266 | + | |
2261 | 2267 | | |
2262 | 2268 | | |
2263 | 2269 | | |
| |||
0 commit comments