You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# The hard limit of TRITON_MAX_TENSOR_NUMEL is 1048576 https://github.com/triton-lang/triton/blob/ba42a5c68fd0505f8c42f4202d53be0f8d9a5fe0/python/triton/language/core.py#L19
259
260
# However, setting limit as 65536 as in LayerNorm tutorial is faster because of less register spilling
260
261
# The optimal maximum block size depends on your hardware, your kernel, and your dtype
261
-
fromliger_kernel.utilsimportinfer_device
262
-
MAX_FUSED_SIZE=4096ifinfer_device() =='xpu'else65536//2# the best size we found by manually tuning
262
+
MAX_FUSED_SIZE=4096ifinfer_device() =="xpu"else65536//2# the best size we found by manually tuning
0 commit comments