You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<!-- .github/pull_request_template.md -->
## 📌 Description
* Added support for Relu2 non-gated activation in BF16 Fused MoE by
adding `activation_type` to external API:
* `trtllm_bf16_moe`
* `trtllm_bf16_routed_moe`
* `Bf16MoeLauncher::init`
* Updated trtllm-gen batched GEMM kernels
* Updated
`tests/moe/test_trtllm_gen_fused_moe.py::test_deepseekv3_routing` to
include BF16 with Nemotron config, fixed nemotron config
`intermediate_size` test param to match Nemotron 3 Super.
* Fixed import issues found by `pre-commit run --all-files`
* Required change from trtllm-gen batched GEMM update: Changed
`options.mNumStages == 4` to `options.mNumStagesA == 4 &&
options.mNumStagesB == 4` in `prioritizePredefinedConfigs` function in
`csrc/trtllm_batched_gemm_runner.cu`.
## 🔍 Related Issues
<!-- Link any related issues here -->
## 🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### ✅ Pre-commit Checks
- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## 🧪 Tests
- [x] Tests have been added or updated as needed.
- [ ] All tests are passing (`unittest`, etc.).
## Reviewer Notes
<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* MoE APIs now accept a validated runtime activation_type, enabling
selectable activation functions for BF16 and FP8 inference.
* **Tests**
* Expanded DeepSeekV3 routing tests and added BF16 to non-gated
activation coverage.
* Updated test parameters to reflect new compatibility.
* **Bug Fixes**
* Adjusted kernel configuration prioritization for a specific
corner-case path.
* **Refactor**
* Internal enum imports reorganized to a shared enums module.
* **Chores**
* Updated batched GEMM artifact path and checksum.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
0 commit comments