Skip to content

Commit 0189f41

Browse files
btw616Kangyan-Zhou
authored andcommitted
[DLLM] Remove cuda graph batch size limitation (#17458)
1 parent b6e4893 commit 0189f41

File tree

1 file changed

+0
-5
lines changed

1 file changed

+0
-5
lines changed

python/sglang/srt/server_args.py

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2530,11 +2530,6 @@ def _handle_dllm_inference(self):
25302530
)
25312531
self.attention_backend = "triton"
25322532
elif not self.disable_cuda_graph:
2533-
if self.cuda_graph_bs != [1]:
2534-
logger.warning(
2535-
"Cuda graph bs is set to [1] because of using diffusion LLM inference"
2536-
)
2537-
self.cuda_graph_bs = [1]
25382533
if self.attention_backend != "flashinfer":
25392534
logger.warning(
25402535
"Attention backend is set to flashinfer because of enabling cuda graph in diffusion LLM inference"

0 commit comments

Comments
 (0)