Skip to content

Commit 20b3e76

Browse files
aleozlxyzh119
authored andcommitted
Revert "fix(jit): GEMM kernels produce NaN under concurrency — missing GDC flags cause PDL synchronization barriers to compile as no-ops" (flashinfer-ai#2737)
Proposing to revert flashinfer-ai#2716 in order to unblock 0.6.6 release flashinfer-ai#2716 seems to have broken AOT packages https://github.com/flashinfer-ai/flashinfer/actions/runs/22870567870/job/66353637447?pr=2730 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Removed legacy GPU compilation flags related to GDC enablement for certain GPU tiers during JIT GEMM generation, reducing extra compile flags and build noise; GDC-related flags for the latest GPU tier remain enabled where still applicable. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: yzh119 <zihaoy@nvidia.com> Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com>
1 parent 3d07465 commit 20b3e76

1 file changed

Lines changed: 1 addition & 15 deletions

File tree

flashinfer/jit/gemm/core.py

Lines changed: 1 addition & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,6 @@ def gen_gemm_sm100_module_cutlass_fp4() -> JitSpec:
9191
+ [
9292
"-DENABLE_BF16",
9393
"-DENABLE_FP4",
94-
"-DCUTLASS_ENABLE_GDC_FOR_SM100=1",
95-
"-DCUTLASS_ENABLE_GDC_FOR_SM90=1",
9694
],
9795
extra_cflags=[
9896
"-DFAST_BUILD",
@@ -160,8 +158,6 @@ def gen_gemm_sm103_module_cutlass_fp4() -> JitSpec:
160158
+ [
161159
"-DENABLE_BF16",
162160
"-DENABLE_FP4",
163-
"-DCUTLASS_ENABLE_GDC_FOR_SM100=1",
164-
"-DCUTLASS_ENABLE_GDC_FOR_SM90=1",
165161
],
166162
extra_cflags=[
167163
"-DFAST_BUILD",
@@ -210,8 +206,6 @@ def gen_gemm_sm120_module_cutlass_fp4() -> JitSpec:
210206
+ [
211207
"-DENABLE_BF16",
212208
"-DENABLE_FP4",
213-
"-DCUTLASS_ENABLE_GDC_FOR_SM100=1",
214-
"-DCUTLASS_ENABLE_GDC_FOR_SM90=1",
215209
],
216210
extra_cflags=[
217211
"-DFAST_BUILD",
@@ -262,8 +256,6 @@ def gen_gemm_sm100_module_cutlass_fp8() -> JitSpec:
262256
extra_cuda_cflags=nvcc_flags
263257
+ [
264258
"-DENABLE_BF16",
265-
"-DCUTLASS_ENABLE_GDC_FOR_SM100=1",
266-
"-DCUTLASS_ENABLE_GDC_FOR_SM90=1",
267259
],
268260
extra_cflags=[
269261
"-DFAST_BUILD",
@@ -357,8 +349,6 @@ def gen_gemm_sm100_module_cutlass_mxfp8() -> JitSpec:
357349
extra_cuda_cflags=nvcc_flags
358350
+ [
359351
"-DENABLE_BF16",
360-
"-DCUTLASS_ENABLE_GDC_FOR_SM100=1",
361-
"-DCUTLASS_ENABLE_GDC_FOR_SM90=1",
362352
],
363353
extra_cflags=[
364354
"-DFAST_BUILD",
@@ -526,11 +516,7 @@ def gen_gemm_sm120_module() -> JitSpec:
526516
return gen_jit_spec(
527517
"gemm_sm120",
528518
source_paths,
529-
extra_cuda_cflags=nvcc_flags
530-
+ [
531-
"-DCUTLASS_ENABLE_GDC_FOR_SM100=1",
532-
"-DCUTLASS_ENABLE_GDC_FOR_SM90=1",
533-
],
519+
extra_cuda_cflags=nvcc_flags,
534520
)
535521

536522

0 commit comments

Comments
 (0)