Skip to content

fix: guard batchWarpReduceSum with ENABLE_FP8 to fix compilation without FP8#2328

Merged
yzh119 merged 2 commits intomainfrom
claude/issue-2271-20260111-0705
Jan 13, 2026
Merged

fix: guard batchWarpReduceSum with ENABLE_FP8 to fix compilation without FP8#2328
yzh119 merged 2 commits intomainfrom
claude/issue-2271-20260111-0705

Conversation

@yzh119
Copy link
Copy Markdown
Collaborator

@yzh119 yzh119 commented Jan 11, 2026

Fixes #2271

The batchWarpReduceSum function in reduceKernelUtils.cuh depends on the PackType template which is only defined when ENABLE_FP8 is set. This causes compilation errors when including norm.cuh without ENABLE_FP8.

Since batchWarpReduceSum is unused (dead code), guard it with #ifdef ENABLE_FP8 to prevent compilation errors.

Changes

  • Added #ifdef ENABLE_FP8 guards around batchWarpReduceSum in include/flashinfer/trtllm/common/reduceKernelUtils.cuh
  • Added #ifdef ENABLE_FP8 guards around batchWarpReduceSum in csrc/nv_internal/tensorrt_llm/common/reduceKernelUtils.cuh

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Refactor

    • Added conditional compilation support to kernel utilities to better isolate optional FP8 functionality.
  • Bug Fixes

    • Prevents build-time failures when FP8 support is not enabled by gating the new kernel path.
  • Tests

    • Added a compilation test to verify the module builds correctly without FP8 enabled.

✏️ Tip: You can customize this high-level summary in your review settings.

…out FP8

The batchWarpReduceSum function in reduceKernelUtils.cuh depends on the
PackType template which is only defined when ENABLE_FP8 is set. This
causes compilation errors when including norm.cuh without ENABLE_FP8.

Since batchWarpReduceSum is unused (dead code), guard it with
#ifdef ENABLE_FP8 to prevent compilation errors.

Fixes #2271

Co-authored-by: Zihao Ye <yzh119@users.noreply.github.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 11, 2026

📝 Walkthrough

Walkthrough

Adds a new template function batchWarpReduceSum<T, SZ> guarded by #ifdef ENABLE_FP8 to two header files and adds a test ensuring compilation without ENABLE_FP8. The function performs warp-wide sum reduction on PackType<T, SZ>::type and is only available when FP8 support is enabled.

Changes

Cohort / File(s) Summary
FP8-guarded warp reduction function
csrc/nv_internal/tensorrt_llm/common/reduceKernelUtils.cuh, include/flashinfer/trtllm/common/reduceKernelUtils.cuh
Added template <typename T, int SZ> __inline__ __device__ typename PackType<T, SZ>::type batchWarpReduceSum(...) wrapped in #ifdef ENABLE_FP8. Performs per-element warp-wide sum on PackType<T,SZ>::type and is conditionally compiled only when ENABLE_FP8 is defined.
Compilation test
tests/utils/test_norm.py
Added imports and test_norm_compilation_without_fp8() which generates a JIT spec excluding ENABLE_FP8, builds/loads the module, and asserts successful module load to verify headers compile without FP8 enabled.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • nvmbreughe
  • aleozlx
  • djmmoss
  • kahyunnam
  • jiahanc

Poem

🐰 A tiny hop in header land,
FP8 gates the warp-sum band,
PackType waits where flags align,
tests compile cleanly — what a sign! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately summarizes the main fix: guarding batchWarpReduceSum with ENABLE_FP8 to resolve compilation failures without FP8 support.
Description check ✅ Passed The description clearly explains the problem, root cause, and solution. However, it lacks explicit mention of the pre-commit checks and test validation sections from the template.
Linked Issues check ✅ Passed The PR fully addresses issue #2271 by guarding batchWarpReduceSum with #ifdef ENABLE_FP8 in both files, preventing PackType references when FP8 is disabled, and adding a test to verify the fix works.
Out of Scope Changes check ✅ Passed All changes are directly in scope: adding #ifdef ENABLE_FP8 guards around batchWarpReduceSum in two files and adding a test_norm_compilation_without_fp8 test to verify the fix.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dfc1567 and f694301.

📒 Files selected for processing (1)
  • tests/utils/test_norm.py
🧰 Additional context used
📓 Path-based instructions (1)
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test implementations should use flashinfer.utils functions (get_compute_capability, is_sm90a_supported, is_sm100a_supported, etc.) to skip tests on unsupported GPU architectures
For testing with mpirun on multi-GPU systems, use the pattern: mpirun -np <num_gpus> pytest tests/path/to/test.py::test_function
Avoid OOM (out-of-memory) errors in tests by using appropriate problem sizes - tests/conftest.py provides auto-skipping for OOM tests as a safety net but should not be relied upon

Files:

  • tests/utils/test_norm.py
🧠 Learnings (9)
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/jit/**/*.py : Use `gen_jit_spec()` function to return a properly configured JitSpec from module generators with appropriate `sources` and `extra_cuda_cflags`

Applied to files:

  • tests/utils/test_norm.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/jit/**/*.py : JIT module generators in `flashinfer/jit/` must follow the pattern: compute URI → create directory → (optional) render Jinja template → copy sources → return JitSpec

Applied to files:

  • tests/utils/test_norm.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Use `FLASHINFER_CUDA_ARCH_LIST` environment variable to specify target GPU architectures (e.g., '8.0 9.0a') and `FLASHINFER_NVCC_THREADS` to control parallel compilation threads

Applied to files:

  • tests/utils/test_norm.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to include/**/*.cuh : Kernel code in `include/flashinfer/` is automatically picked up by JIT compilation on changes - no pip reinstall needed

Applied to files:

  • tests/utils/test_norm.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/**/*.py : Use `flashinfer_api` decorator for debugging API calls, enable via `FLASHINFER_LOGLEVEL` environment variable (0=off, 1=basic, 3=detailed, 5=with stats)

Applied to files:

  • tests/utils/test_norm.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/jit/**/*.py : Specify `supported_major_versions` in JitSpec to restrict kernel compilation to supported GPU architectures (e.g., SM versions 9, 10, 11, 12 for Hopper/newer)

Applied to files:

  • tests/utils/test_norm.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/aot.py : Register new operations in `flashinfer/aot.py` by calling the `gen_*_module()` function for AOT (Ahead-Of-Time) pre-compilation support

Applied to files:

  • tests/utils/test_norm.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to flashinfer/__init__.py : Export new operations in `flashinfer/__init__.py` to make them available as public API

Applied to files:

  • tests/utils/test_norm.py
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to tests/**/*.py : Test implementations should use `flashinfer.utils` functions (`get_compute_capability`, `is_sm90a_supported`, `is_sm100a_supported`, etc.) to skip tests on unsupported GPU architectures

Applied to files:

  • tests/utils/test_norm.py
🧬 Code graph analysis (1)
tests/utils/test_norm.py (1)
flashinfer/jit/core.py (1)
  • gen_jit_spec (400-466)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Deploy Docs
  • GitHub Check: claude-review
🔇 Additional comments (2)
tests/utils/test_norm.py (2)

22-23: LGTM!

The new imports are appropriate for the compilation test added below.


342-369: No changes needed. The test is correct and effectively verifies compilation without FP8 support.

The codebase intentionally uses two separate macro naming conventions:

  • ENABLE_FP8 (TensorRT-LLM code convention, used in guards like #ifdef ENABLE_FP8)
  • FLASHINFER_ENABLE_FP8_E4M3, FLASHINFER_ENABLE_FP8_E5M2 (FlashInfer convention, added by gen_jit_spec)

Since the test does not define ENABLE_FP8 and batchWarpReduceSum is guarded with #ifdef ENABLE_FP8, the preprocessor will correctly exclude this function during compilation. The FLASHINFER_ENABLE_FP8_* flags added by gen_jit_spec do not interfere with the TensorRT-LLM code's guards.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @yzh119, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a compilation issue where the batchWarpReduceSum function caused errors when the ENABLE_FP8 preprocessor flag was not active. The function's dependency on PackType, which is only defined under ENABLE_FP8, led to build failures in certain configurations. By wrapping the batchWarpReduceSum function with #ifdef ENABLE_FP8 directives, the change ensures that this currently unused code is only compiled when its dependencies are met, thereby fixing the compilation errors without altering runtime behavior.

Highlights

  • Compilation Fix: Resolves compilation errors that occurred when the ENABLE_FP8 flag was not defined, specifically impacting the batchWarpReduceSum function.
  • Conditional Compilation: The batchWarpReduceSum function, which depends on PackType (only available with ENABLE_FP8), is now guarded by #ifdef ENABLE_FP8 directives.
  • Code Redundancy: The batchWarpReduceSum function was identified as unused (dead code), making its conditional compilation a safe and effective solution.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a compilation error that occurs when ENABLE_FP8 is not defined. The batchWarpReduceSum function, which depends on the PackType template, is now conditionally compiled using #ifdef ENABLE_FP8. This change is correctly applied to both csrc/nv_internal/tensorrt_llm/common/reduceKernelUtils.cuh and include/flashinfer/trtllm/common/reduceKernelUtils.cuh, resolving the build failure. The approach is sound and the implementation is correct.

@claude
Copy link
Copy Markdown

claude Bot commented Jan 11, 2026

Code Review

Summary

This PR fixes a compilation error when including norm.cuh without the ENABLE_FP8 flag by adding conditional compilation guards around the unused batchWarpReduceSum function. The fix is correct, safe, and follows established patterns in the codebase.

✅ Strengths

  1. Correct Root Cause Analysis: The PR correctly identifies that PackType is only defined when ENABLE_FP8 is set (in cudaFp8Utils.h:19), and batchWarpReduceSum depends on it.

  2. Minimal, Safe Change: Guards dead code rather than making architectural changes. Since batchWarpReduceSum has zero call sites in the codebase (verified via grep), this is a zero-risk fix.

  3. Consistent with Codebase Patterns: The #ifdef ENABLE_FP8 pattern is used 79 times across 29 files in the codebase, making this change consistent with established conventions.

  4. Addresses Both Copies: Correctly updates both:

    • include/flashinfer/trtllm/common/reduceKernelUtils.cuh (framework-agnostic headers)
    • csrc/nv_internal/tensorrt_llm/common/reduceKernelUtils.cuh (internal copy)
  5. Good Documentation: Includes inline comment explaining why the guard is needed.

📝 Observations

  1. Dead Code: batchWarpReduceSum is completely unused (only 2 definition occurrences, zero call sites). This could be removed entirely in a future cleanup, but guarding it is the safer immediate fix.

  2. Guard Style: Uses #endif // ENABLE_FP8 which matches the style in cudaFp8Utils.h.

  3. Testing: The fix resolves the reported compilation error. The minimal reproduction case from issue norm.cuh fails to compile without ENABLE_FP8 #2271 should now compile successfully.

🔍 Code Quality

  • Correctness: ✅ Solves the reported issue
  • Safety: ✅ Zero risk (guards unused code)
  • Style: ✅ Matches codebase conventions
  • Documentation: ✅ Clear inline comment
  • Test Coverage: ⚠️ No explicit test added, but the fix is trivial and the compilation itself serves as verification

💡 Suggestions (Optional, Non-Blocking)

  1. Consider Dead Code Removal: In a future PR, consider removing batchWarpReduceSum entirely since it's unused. However, if this is TensorRT-LLM vendor code that might be updated from upstream, keeping it guarded is fine.

  2. Add Compilation Test: Consider adding a simple compilation test (e.g., in CI) that builds without ENABLE_FP8 to prevent regression. However, this is probably already covered by your existing CI matrix.

✅ Recommendation

LGTM - Ready to merge. This is a well-executed, minimal fix that correctly addresses the compilation issue following established codebase patterns.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
include/flashinfer/trtllm/common/reduceKernelUtils.cuh (1)

161-184: Consider removing batchWarpReduceSum in a follow-up cleanup.

The #ifdef ENABLE_FP8 guard correctly prevents compilation errors when FP8 support is disabled since PackType is only available under that configuration. The comment clearly documents this dependency.

However, verification confirms that batchWarpReduceSum is never called anywhere in the codebase—it appears only 2 times (as identical definitions in two files) with zero function calls. Rather than keeping guarded dead code, remove this unused function entirely unless it's part of a public API contract or planned for future use.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2062dec and dfc1567.

📒 Files selected for processing (2)
  • csrc/nv_internal/tensorrt_llm/common/reduceKernelUtils.cuh
  • include/flashinfer/trtllm/common/reduceKernelUtils.cuh
🧰 Additional context used
📓 Path-based instructions (1)
include/**/*.cuh

📄 CodeRabbit inference engine (CLAUDE.md)

include/**/*.cuh: Torch headers MUST NOT be included in files within the include/ directory - keep framework-agnostic CUDA kernels that accept raw pointers
Kernel code in include/flashinfer/ is automatically picked up by JIT compilation on changes - no pip reinstall needed
For performance-critical hot paths, leave comments explaining special algorithmic choices and potential alternatives for future reviewers
Consult the PTX ISA documentation (https://docs.nvidia.com/cuda/parallel-thread-execution/) for low-level instruction details and new GPU architecture features when writing inline PTX assembly

Files:

  • include/flashinfer/trtllm/common/reduceKernelUtils.cuh
🧠 Learnings (1)
📚 Learning: 2025-12-30T09:34:39.900Z
Learnt from: CR
Repo: flashinfer-ai/flashinfer PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-30T09:34:39.900Z
Learning: Applies to include/**/*.cuh : Consult the PTX ISA documentation (https://docs.nvidia.com/cuda/parallel-thread-execution/) for low-level instruction details and new GPU architecture features when writing inline PTX assembly

Applied to files:

  • include/flashinfer/trtllm/common/reduceKernelUtils.cuh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Deploy Docs
  • GitHub Check: claude-review
🔇 Additional comments (1)
csrc/nv_internal/tensorrt_llm/common/reduceKernelUtils.cuh (1)

161-184: Consistent fix applied correctly.

The #ifdef ENABLE_FP8 guard is correctly applied here, maintaining consistency with the parallel header file in include/flashinfer/trtllm/common/reduceKernelUtils.cuh. The identical implementation ensures both code paths remain synchronized.

return val;
}

#ifdef ENABLE_FP8
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@claude write a simple python unittest to verify norm module compiles when ENABLE_FP8 is disabled.

Copy link
Copy Markdown

@claude claude Bot Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude finished @yzh119's task in 2m 3s —— View job


Task: Write unittest to verify norm compilation without FP8 ✅

Completed Tasks:

  • Read CLAUDE.md to understand testing patterns
  • Examine existing norm tests to understand the pattern
  • Understand how ENABLE_FP8 is controlled
  • Create a test that verifies norm module compiles without FP8
  • Add test to appropriate test file
  • Commit and push changes

Test Created:
Added test_norm_compilation_without_fp8() to tests/utils/test_norm.py. This test:

  • Creates a JIT spec for the norm module without the -DENABLE_FP8 flag
  • Attempts to build and load the module via JIT compilation
  • Verifies compilation succeeds (which would fail before the fix due to PackType being undefined)

How it verifies the fix:
The test directly addresses issue #2271 by compiling the norm module without FP8 support. Before the fix, this would fail with:

error: incomplete type is not allowed

because batchWarpReduceSum in reduceKernelUtils.cuh depends on PackType, which is only defined when ENABLE_FP8 is set.

With the fix (guarding batchWarpReduceSum with #ifdef ENABLE_FP8), the compilation succeeds because the problematic code is excluded when FP8 is disabled.

Commit: f694301

--- |

Add test_norm_compilation_without_fp8() to verify that the norm module
can be successfully compiled without the ENABLE_FP8 flag. This test
ensures that the fix for issue #2271 (guarding batchWarpReduceSum with
#ifdef ENABLE_FP8) prevents compilation errors when FP8 support is
disabled.

The test creates a JIT spec without -DENABLE_FP8 and verifies that
compilation succeeds. Without the fix, this would fail with 'incomplete
type' error for PackType.

Co-authored-by: Zihao Ye <yzh119@users.noreply.github.com>
@claude
Copy link
Copy Markdown

claude Bot commented Jan 11, 2026

Code Review

Thank you for this fix! The PR correctly addresses issue #2271 by guarding the unused batchWarpReduceSum function with ENABLE_FP8 preprocessor directives. Here's my detailed review:

✅ Strengths

  1. Correct root cause identification: The fix properly identifies that batchWarpReduceSum depends on PackType, which has FP8-specific specializations only available when ENABLE_FP8 is defined.

  2. Minimal, targeted fix: Guarding dead code with #ifdef ENABLE_FP8 is the right approach here - simple and effective.

  3. Excellent test coverage: The addition of test_norm_compilation_without_fp8() is great! It:

    • Directly tests the reported issue
    • Provides clear documentation of what would fail without the fix
    • Uses the actual JIT compilation infrastructure
  4. Consistency: The fix is applied identically to both locations (include/flashinfer/ and csrc/nv_internal/), maintaining consistency.

  5. Good documentation: The inline comment explains why the guard is needed.

🔍 Minor Observations

  1. PackType availability: While the PR description states "PackType is only defined when ENABLE_FP8 is set", I notice that PackType actually has many specializations defined unconditionally (for float, half, etc.) at cudaFp8Utils.h:119-177. Only the FP8-specific specializations (__nv_fp8_e4m3) are guarded by #ifdef ENABLE_FP8. This doesn't affect the correctness of the fix, but the comment could be slightly more precise:

    Suggested refinement:

    // This function requires FP8-specific PackType specializations
  2. Test robustness: The test test_norm_compilation_without_fp8() could potentially verify that the module is functional (e.g., call a simple norm operation), not just that it compiles. However, this might be overkill for a compilation test.

  3. Dead code: Since batchWarpReduceSum is confirmed to be unused (verified via grep), an alternative approach would be to delete it entirely. However, if this code comes from TensorRT-LLM (trtllm/common/ namespace suggests vendor code), keeping it with guards is the right choice for future compatibility.

📝 Test Verification Suggestion

To ensure the test actually catches the regression, you could verify that removing the #ifdef guard causes the test to fail. This would confirm the test is effective.

✅ Verdict

LGTM - This is a clean, well-tested fix that solves the reported issue without introducing any regressions. The test ensures the problem won't resurface.

Style Compliance

The changes follow FlashInfer conventions from CLAUDE.md:

  • ✅ Framework-agnostic headers (no PyTorch deps in include/)
  • ✅ Proper use of JIT infrastructure in tests
  • ✅ Clear documentation of the fix purpose

Great work! 🚀

@yzh119
Copy link
Copy Markdown
Collaborator Author

yzh119 commented Jan 11, 2026

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !234 has been created, and the CI pipeline #41498105 is currently running. I'll report back once the pipeline job completes.

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

[SUCCESS] Pipeline #41498105: 17/20 passed

@yzh119 yzh119 merged commit 09d0c7f into main Jan 13, 2026
6 checks passed
@yzh119 yzh119 deleted the claude/issue-2271-20260111-0705 branch January 13, 2026 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

norm.cuh fails to compile without ENABLE_FP8

3 participants