Commit 5e1318c
fix(gdn): use physical SM count for SM100 persistent prefill kernel (#3155)
## 📌 Description
Fixes the `num_sm` issue CodeRabbit flagged on #3001 but which was not
applied before merge:
#3001 (comment)
The raw `HardwareInfo().get_max_active_clusters(1)` call returns 0 /
stale values in spawned subprocesses (e.g. vLLM's EngineCore workers)
where the CUDA driver API context has not been made current yet. The
persistent tile scheduler then leaves some CTAs without any work and the
kernel deadlocks at first call. Switch to `get_num_sm(q.device)`,
matching the SM120 MoE dispatch.
## 🔍 Related Issues
## 🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### ✅ Pre-commit Checks
- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## 🧪 Tests
- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).
## Reviewer Notes
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Kernel compilation now derives device-specific SM and cluster counts
at runtime, improving GPU resource allocation and leading to more
consistent performance across different CUDA devices.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>1 parent 24c4aee commit 5e1318c
1 file changed
Lines changed: 4 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
37 | 36 | | |
38 | 37 | | |
| 38 | + | |
| 39 | + | |
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
| |||
157 | 158 | | |
158 | 159 | | |
159 | 160 | | |
160 | | - | |
161 | | - | |
162 | | - | |
| 161 | + | |
| 162 | + | |
163 | 163 | | |
164 | 164 | | |
165 | 165 | | |
| |||
0 commit comments