[Bugfix] Normalize KunlunGraph splitting_ops for piecewise cudagraph by Lidang-Jiang · Pull Request #329 · baidu/vLLM-Kunlun

Lidang-Jiang · 2026-04-20T07:35:04Z

PR Description

FIX #311

Checklist (Required)

All code changes pass the pre-commit checks.
Commits are signed off using git commit -s.
The PR title is properly classified.

Summary

normalize legacy vllm.xxx splitting op names to the vllm::xxx format expected by vLLM piecewise cudagraphs
when KunlunGraph runs with piecewise cudagraphs and users provide legacy or partial attention split ops, automatically append vllm::unified_attention_with_output_kunlun and the full CompilationConfig._attention_ops set while preserving custom split ops and deduplicating in order
add a regression test for the legacy config path and update docs to stop recommending manual compilation_config.splitting_ops in normal usage

Before

Command:

PYTHONPATH=/ssd1/jianglidang/workspace/vLLM-Kunlun-issue-311-before \
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 \
/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/bin/python -m vllm.entrypoints.openai.api_server \
  --host 0.0.0.0 \
  --port 8567 \
  --model /ssd1/models/Qwen2.5-72B-Instruct \
  --served-model-name Qwen2.5-72B-Instruct \
  --gpu-memory-utilization 0.9 \
  --trust-remote-code \
  --max-model-len 132096 \
  --tensor-parallel-size 8 \
  --dtype float16 \
  --max_num_seqs 4 \
  --max_num_batched_tokens 132096 \
  --block-size 128 \
  --no-enable-prefix-caching \
  --no-enable-chunked-prefill \
  --distributed-executor-backend mp \
  --compilation-config '{"splitting_ops":["vllm.unified_attention_with_output_kunlun"]}'

ERROR 04-20 15:31:21 [multiproc_executor.py:772]     self.worker.init_device()  # type: ignore
ERROR 04-20 15:31:21 [multiproc_executor.py:772]   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 262, in init_device
ERROR 04-20 15:31:21 [multiproc_executor.py:772]     self.model_runner = GPUModelRunnerV1(self.vllm_config, self.device)
ERROR 04-20 15:31:21 [multiproc_executor.py:772]   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 644, in __init__
ERROR 04-20 15:31:21 [multiproc_executor.py:772]     self.cudagraph_dispatcher = CudagraphDispatcher(self.vllm_config)
ERROR 04-20 15:31:21 [multiproc_executor.py:772]   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/cudagraph_dispatcher.py", line 46, in __init__
ERROR 04-20 15:31:21 [multiproc_executor.py:772]     assert (
ERROR 04-20 15:31:21 [multiproc_executor.py:772] AssertionError: Compilation mode should be CompilationMode.VLLM_COMPILE when cudagraph_mode piecewise cudagraphs is used, and attention should be in splitting_ops or inductor splitting should be used. cudagraph_mode=FULL_AND_PIECEWISE, compilation_mode=3, splitting_ops=['vllm.unified_attention_with_output_kunlun']
ERROR 04-20 15:31:21 [multiproc_executor.py:772] WorkerProc failed to start.
ERROR 04-20 15:31:21 [multiproc_executor.py:772] Traceback (most recent call last):
ERROR 04-20 15:31:21 [multiproc_executor.py:772]   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 743, in worker_main
ERROR 04-20 15:31:21 [multiproc_executor.py:772]     worker = WorkerProc(*args, **kwargs)
ERROR 04-20 15:31:21 [multiproc_executor.py:772]   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 569, in __init__
ERROR 04-20 15:31:21 [multiproc_executor.py:772]     self.worker.init_device()
ERROR 04-20 15:31:21 [multiproc_executor.py:772]   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/worker/worker_base.py", line 326, in init_device
ERROR 04-20 15:31:21 [multiproc_executor.py:772]     self.worker.init_device()  # type: ignore
ERROR 04-20 15:31:21 [multiproc_executor.py:772]   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 262, in init_device
ERROR 04-20 15:31:21 [multiproc_executor.py:772]     self.model_runner = GPUModelRunnerV1(self.vllm_config, self.device)
ERROR 04-20 15:31:21 [multiproc_executor.py:772]   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 644, in __init__
ERROR 04-20 15:31:21 [multiproc_executor.py:772]     self.cudagraph_dispatcher = CudagraphDispatcher(self.vllm_config)
ERROR 04-20 15:31:21 [multiproc_executor.py:772]   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/cudagraph_dispatcher.py", line 46, in __init__
ERROR 04-20 15:31:21 [multiproc_executor.py:772]     assert (
ERROR 04-20 15:31:21 [multiproc_executor.py:772] AssertionError: Compilation mode should be CompilationMode.VLLM_COMPILE when cudagraph_mode piecewise cudagraphs is used, and attention should be in splitting_ops or inductor splitting should be used. cudagraph_mode=FULL_AND_PIECEWISE, compilation_mode=3, splitting_ops=['vllm.unified_attention_with_output_kunlun']
WARNING 04-20 15:31:21 [multiproc_executor.py:786] WorkerProc was terminated
WARNING 04-20 15:31:21 [multiproc_executor.py:786] WorkerProc was terminated
[rank5]:[W420 15:31:21.627045585 TCPStore.cpp:141] [c10d] recvValue failed on SocketImpl(fd=60, addr=[::ffff:127.0.0.1]:59184, remote=[::ffff:127.0.0.1]:25199): Connection reset by peer
Exception raised from recvBytes at ../torch/csrc/distributed/c10d/Utils.hpp:667 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f91b186d446 in /ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x5fed856 (0x7f91f0b4c856 in /ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::string, std::allocator<std::string> > const&) + 0x354 (0x7f91f0b48ac4 in /ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x2b36ce4 (0x7f90150c7ce4 in /ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/torch_xmlir/_XMLIRC.cpython-310-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0xd6df4 (0x7f920363fdf4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7f9205078609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f9204e43353 in /lib/x86_64-linux-gnu/libc.so.6)

WARNING 04-20 15:31:21 [multiproc_executor.py:786] WorkerProc was terminated
WARNING 04-20 15:31:21 [multiproc_executor.py:786] WorkerProc was terminated
[rank1]:[W420 15:31:21.780770131 TCPStore.cpp:141] [c10d] recvValue failed on SocketImpl(fd=63, addr=[::ffff:127.0.0.1]:59188, remote=[::ffff:127.0.0.1]:25199): failed to recv, got 0 bytes
Exception raised from recvBytes at ../torch/csrc/distributed/c10d/Utils.hpp:670 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f88eea6e446 in /ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x5fed788 (0x7f892dd4d788 in /ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::string, std::allocator<std::string> > const&) + 0x354 (0x7f892dd49ac4 in /ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x2b36ce4 (0x7f87522c8ce4 in /ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/torch_xmlir/_XMLIRC.cpython-310-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0xd6df4 (0x7f8940840df4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7f8942279609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f8942044353 in /lib/x86_64-linux-gnu/libc.so.6)

[rank6]:[W420 15:31:21.824669393 TCPStore.cpp:141] [c10d] recvValue failed on SocketImpl(fd=60, addr=[::ffff:127.0.0.1]:59190, remote=[::ffff:127.0.0.1]:25199): failed to recv, got 0 bytes
Exception raised from recvBytes at ../torch/csrc/distributed/c10d/Utils.hpp:670 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f527ef29446 in /ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x5fed788 (0x7f52be208788 in /ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #2: c10d::TCPStore::check(std::vector<std::string, std::allocator<std::string> > const&) + 0x354 (0x7f52be204ac4 in /ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x2b36ce4 (0x7f50e2783ce4 in /ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/torch_xmlir/_XMLIRC.cpython-310-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0xd6df4 (0x7f52d0cfbdf4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #5: <unknown function> + 0x8609 (0x7f52d2734609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #6: clone + 0x43 (0x7f52d24ff353 in /lib/x86_64-linux-gnu/libc.so.6)

(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946] EngineCore failed to start.
(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946] Traceback (most recent call last):
(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946]   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946]   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 691, in __init__
(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946]     super().__init__(
(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946]   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 105, in __init__
(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946]   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 97, in __init__
(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946]     super().__init__(vllm_config)
(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946]   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946]     self._init_executor()
(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946]   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 165, in _init_executor
(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946]     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946]   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 678, in wait_for_ready
(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946]     raise e from None
(EngineCore_DP0 pid=14005) ERROR 04-20 15:31:23 [core.py:946] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(EngineCore_DP0 pid=14005) Process EngineCore_DP0:
(EngineCore_DP0 pid=14005) Traceback (most recent call last):
(EngineCore_DP0 pid=14005)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=14005)     self.run()
(EngineCore_DP0 pid=14005)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=14005)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=14005)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 950, in run_engine_core
(EngineCore_DP0 pid=14005)     raise e
(EngineCore_DP0 pid=14005)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=14005)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=14005)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 691, in __init__
(EngineCore_DP0 pid=14005)     super().__init__(
(EngineCore_DP0 pid=14005)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 105, in __init__
(EngineCore_DP0 pid=14005)     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=14005)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 97, in __init__
(EngineCore_DP0 pid=14005)     super().__init__(vllm_config)
(EngineCore_DP0 pid=14005)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=14005)     self._init_executor()
(EngineCore_DP0 pid=14005)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 165, in _init_executor
(EngineCore_DP0 pid=14005)     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=14005)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 678, in wait_for_ready
(EngineCore_DP0 pid=14005)     raise e from None
(EngineCore_DP0 pid=14005) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(APIServer pid=13516) Traceback (most recent call last):
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/runpy.py", line 196, in _run_module_as_main
(APIServer pid=13516)     return _run_code(code, main_globals, None,
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/runpy.py", line 86, in _run_code
(APIServer pid=13516)     exec(code, run_globals)
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 991, in <module>
(APIServer pid=13516)     uvloop.run(run_server(args))
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run
(APIServer pid=13516)     return loop.run_until_complete(wrapper())
(APIServer pid=13516)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=13516)     return await main
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 919, in run_server
(APIServer pid=13516)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 938, in run_server_worker
(APIServer pid=13516)     async with build_async_engine_client(
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=13516)     return await anext(self.gen)
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 147, in build_async_engine_client
(APIServer pid=13516)     async with build_async_engine_client_from_engine_args(
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=13516)     return await anext(self.gen)
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 188, in build_async_engine_client_from_engine_args
(APIServer pid=13516)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 228, in from_vllm_config
(APIServer pid=13516)     return cls(
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 155, in __init__
(APIServer pid=13516)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 122, in make_async_mp_client
(APIServer pid=13516)     return AsyncMPClient(*client_args)
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 819, in __init__
(APIServer pid=13516)     super().__init__(
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 479, in __init__
(APIServer pid=13516)     with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/contextlib.py", line 142, in __exit__
(APIServer pid=13516)     next(self.gen)
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 933, in launch_core_engines
(APIServer pid=13516)     wait_for_engine_startup(
(APIServer pid=13516)   File "/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 992, in wait_for_engine_startup
(APIServer pid=13516)     raise RuntimeError(
(APIServer pid=13516) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

After

Regression test:

/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/xpytorch_import_hook.py:6: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
...                                                                      [100%]
3 passed in 4.05s

Config normalization check:

/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/xpytorch_import_hook.py:6: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
XCCL /ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/lib/python3.10/site-packages/torch_xmlir/libbkcl.so loaded
�[35mSYMBOL_REWRITE �[0m�[32mtorch success�[0m
INFO 04-20 15:30:53 [__init__.py:43] Available plugins for group vllm.platform_plugins:
INFO 04-20 15:30:53 [__init__.py:45] - kunlun -> vllm_kunlun:register
INFO 04-20 15:30:53 [__init__.py:48] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 04-20 15:30:53 [__init__.py:64] [KunlunPlugin] register() pid=13514
INFO 04-20 15:30:53 [__init__.py:70] [KunlunPlugin] _kunlun native extension loaded
INFO 04-20 15:30:53 [__init__.py:79] [KunlunPlugin] vllm_utils_wrapper loaded and patched
INFO 04-20 15:30:53 [__init__.py:104] [KunlunPlugin] import_hook() ok
INFO 04-20 15:30:54 [__init__.py:123] [KunlunPlugin] registered Qwen3ReasoningParser override (lazy)
INFO 04-20 15:30:54 [__init__.py:128] [KunlunPlugin] register() done
INFO 04-20 15:30:54 [__init__.py:64] [KunlunPlugin] register() pid=13514
INFO 04-20 15:30:54 [__init__.py:70] [KunlunPlugin] _kunlun native extension loaded
INFO 04-20 15:30:54 [__init__.py:79] [KunlunPlugin] vllm_utils_wrapper loaded and patched
INFO 04-20 15:30:54 [__init__.py:104] [KunlunPlugin] import_hook() ok
INFO 04-20 15:30:54 [__init__.py:123] [KunlunPlugin] registered Qwen3ReasoningParser override (lazy)
INFO 04-20 15:30:54 [__init__.py:128] [KunlunPlugin] register() done
INFO 04-20 15:30:54 [__init__.py:217] Platform plugin kunlun is activated
backend eager
contains_kunlun True
compiled_piecewise True
attention_ops_missing []

Service startup and readiness:

PYTHONPATH=/ssd1/jianglidang/workspace/vLLM-Kunlun-issue-311 \
VLLM_KUNLUN_PYTHON=/ssd1/jianglidang/workspace/python310_torch25_cuda_main0151/bin/python \
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 \
bash /ssd1/jianglidang/workspace/Qwen2.5-72B-Instruct/start_service_p800.sh

curl -sS http://127.0.0.1:8566/v1/models
curl -sS http://127.0.0.1:8566/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"Qwen2.5-72B-Instruct","messages":[{"role":"user","content":"请原样回复：验证正常"}],"max_tokens":16,"temperature":0}'

(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  11% Completed | 4/37 [00:02<00:19,  1.68it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  14% Completed | 5/37 [00:02<00:19,  1.65it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  16% Completed | 6/37 [00:03<00:18,  1.64it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  19% Completed | 7/37 [00:04<00:18,  1.64it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  22% Completed | 8/37 [00:04<00:17,  1.67it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  24% Completed | 9/37 [00:05<00:16,  1.69it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  27% Completed | 10/37 [00:05<00:16,  1.67it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  30% Completed | 11/37 [00:06<00:15,  1.67it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  32% Completed | 12/37 [00:07<00:14,  1.67it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  35% Completed | 13/37 [00:07<00:14,  1.71it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  38% Completed | 14/37 [00:08<00:13,  1.70it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  41% Completed | 15/37 [00:08<00:12,  1.69it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  43% Completed | 16/37 [00:09<00:12,  1.71it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  46% Completed | 17/37 [00:10<00:11,  1.78it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  49% Completed | 18/37 [00:10<00:10,  1.80it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  51% Completed | 19/37 [00:11<00:09,  1.82it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  54% Completed | 20/37 [00:11<00:09,  1.79it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  57% Completed | 21/37 [00:12<00:09,  1.77it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  59% Completed | 22/37 [00:12<00:08,  1.77it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  62% Completed | 23/37 [00:13<00:07,  1.78it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  65% Completed | 24/37 [00:13<00:07,  1.75it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  68% Completed | 25/37 [00:14<00:07,  1.71it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  70% Completed | 26/37 [00:15<00:06,  1.69it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  73% Completed | 27/37 [00:15<00:05,  1.69it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  76% Completed | 28/37 [00:16<00:05,  1.68it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  78% Completed | 29/37 [00:16<00:04,  1.72it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  81% Completed | 30/37 [00:17<00:04,  1.73it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  84% Completed | 31/37 [00:18<00:03,  1.75it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  86% Completed | 32/37 [00:18<00:02,  1.80it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  89% Completed | 33/37 [00:19<00:02,  1.79it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  92% Completed | 34/37 [00:19<00:01,  1.79it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  95% Completed | 35/37 [00:20<00:01,  1.80it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards:  97% Completed | 36/37 [00:20<00:00,  1.77it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards: 100% Completed | 37/37 [00:21<00:00,  1.75it/s]
(Worker_TP0 pid=16271) 
Loading safetensors checkpoint shards: 100% Completed | 37/37 [00:21<00:00,  1.73it/s]
(Worker_TP0 pid=16271) 
(Worker_TP0 pid=16271) INFO 04-20 15:32:34 [default_loader.py:291] Loading weights took 21.42 seconds
(Worker_TP0 pid=16271) INFO 04-20 15:32:35 [gpu_model_runner.py:4130] Model loading took 17.0 GiB memory and 21.873389 seconds
(Worker_TP5 pid=16276) WARNING 04-20 15:32:36 [decorators.py:555] Detected eager backend, disabling AOT compile.
(Worker_TP1 pid=16272) WARNING 04-20 15:32:36 [decorators.py:555] Detected eager backend, disabling AOT compile.
(Worker_TP6 pid=16277) WARNING 04-20 15:32:36 [decorators.py:555] Detected eager backend, disabling AOT compile.
(Worker_TP7 pid=16278) WARNING 04-20 15:32:36 [decorators.py:555] Detected eager backend, disabling AOT compile.
(Worker_TP3 pid=16274) WARNING 04-20 15:32:36 [decorators.py:555] Detected eager backend, disabling AOT compile.
(Worker_TP2 pid=16273) WARNING 04-20 15:32:36 [decorators.py:555] Detected eager backend, disabling AOT compile.
(Worker_TP4 pid=16275) WARNING 04-20 15:32:36 [decorators.py:555] Detected eager backend, disabling AOT compile.
(Worker_TP0 pid=16271) WARNING 04-20 15:32:36 [decorators.py:555] Detected eager backend, disabling AOT compile.
(Worker_TP0 pid=16271) INFO 04-20 15:32:51 [backends.py:812] Using cache directory: /home/devuser/.cache/vllm/torch_compile_cache/d8d8d41f49/rank_0_0/backbone for vLLM's torch.compile
(Worker_TP0 pid=16271) INFO 04-20 15:32:51 [backends.py:872] Dynamo bytecode transform time: 15.12 s
(Worker_TP0 pid=16271) INFO 04-20 15:33:16 [backends.py:319] Compiling a graph for compile range (1, 132096) takes 14.64 s
(Worker_TP0 pid=16271) INFO 04-20 15:33:16 [monitor.py:34] torch.compile takes 29.76 s in total
(Worker_TP0 pid=16271) INFO 04-20 15:33:18 [gpu_worker.py:356] Available KV cache memory: 43.03 GiB
(EngineCore_DP0 pid=15861) INFO 04-20 15:33:18 [kv_cache_utils.py:1307] GPU KV cache size: 1,128,064 tokens
(EngineCore_DP0 pid=15861) INFO 04-20 15:33:18 [kv_cache_utils.py:1312] Maximum concurrency for 132,096 tokens per request: 8.54x
(Worker_TP0 pid=16271) 
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):   0%|          | 0/4 [00:00<?, ?it/s][rank4]:[W420 15:33:19.925424531 CUDAGraph.cpp:137] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator())
[rank3]:[W420 15:33:19.925432774 CUDAGraph.cpp:137] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator())
[rank2]:[W420 15:33:19.925424445 CUDAGraph.cpp:137] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator())
[rank1]:[W420 15:33:19.925424579 CUDAGraph.cpp:137] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator())
[rank5]:[W420 15:33:19.925463722 CUDAGraph.cpp:137] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator())
[rank6]:[W420 15:33:19.925463540 CUDAGraph.cpp:137] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator())
[rank0]:[W420 15:33:19.925473900 CUDAGraph.cpp:137] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator())
[rank7]:[W420 15:33:19.926488200 CUDAGraph.cpp:137] Warning: Waiting for pending NCCL work to finish before starting graph capture. (function operator())

Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  25%|██▌       | 1/4 [00:00<00:01,  2.73it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  50%|█████     | 2/4 [00:00<00:00,  2.98it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  75%|███████▌  | 3/4 [00:00<00:00,  3.13it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|██████████| 4/4 [00:01<00:00,  3.28it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|██████████| 4/4 [00:01<00:00,  3.17it/s]
(Worker_TP0 pid=16271) 
Capturing CUDA graphs (decode, FULL):   0%|          | 0/3 [00:00<?, ?it/s]
Capturing CUDA graphs (decode, FULL):  33%|███▎      | 1/3 [00:00<00:00,  3.15it/s]
Capturing CUDA graphs (decode, FULL):  67%|██████▋   | 2/3 [00:00<00:00,  3.30it/s]
Capturing CUDA graphs (decode, FULL): 100%|██████████| 3/3 [00:00<00:00,  3.42it/s]
Capturing CUDA graphs (decode, FULL): 100%|██████████| 3/3 [00:00<00:00,  3.37it/s]
(Worker_TP0 pid=16271) INFO 04-20 15:33:21 [gpu_model_runner.py:5063] Graph capturing finished in 3 secs, took 0.15 GiB
(EngineCore_DP0 pid=15861) INFO 04-20 15:33:21 [core.py:272] init engine (profile, create kv cache, warmup model) took 45.40 seconds
(EngineCore_DP0 pid=15861) WARNING 04-20 15:33:22 [interface.py:222] Failed to import from vllm._C: ImportError('libcudart.so.12: cannot open shared object file: No such file or directory')
(EngineCore_DP0 pid=15861) ERROR 04-20 15:33:22 [config.py:33] Failed to import Triton kernels. Please make sure your triton version is compatible. Error: No module named 'triton.language.target_info'
(EngineCore_DP0 pid=15861) INFO 04-20 15:33:22 [vllm.py:624] Asynchronous scheduling is enabled.
(EngineCore_DP0 pid=15861) WARNING 04-20 15:33:22 [vllm.py:669] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(APIServer pid=15548) INFO 04-20 15:33:22 [api_server.py:665] Supported tasks: ['generate']
(APIServer pid=15548) WARNING 04-20 15:33:22 [model.py:1371] Default vLLM sampling parameters have been overridden by the model's `generation_config.json`: `{'repetition_penalty': 1.05, 'temperature': 0.7, 'top_k': 20, 'top_p': 0.8}`. If this is not intended, please relaunch vLLM instance with `--generation-config vllm`.
(APIServer pid=15548) INFO 04-20 15:33:22 [serving.py:177] Warming up chat template processing...
(APIServer pid=15548) INFO 04-20 15:33:23 [hf.py:310] Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.
(APIServer pid=15548) INFO 04-20 15:33:23 [serving.py:212] Chat template warmup completed in 492.1ms
(APIServer pid=15548) INFO 04-20 15:33:23 [api_server.py:946] Starting vLLM API server 0 on http://0.0.0.0:8566
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:38] Available routes are:
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /openapi.json, Methods: HEAD, GET
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /docs, Methods: HEAD, GET
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: HEAD, GET
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /redoc, Methods: HEAD, GET
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /pause, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /resume, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /is_paused, Methods: GET
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /v1/chat/completions/render, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /v1/audio/transcriptions, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /v1/audio/translations, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /v1/completions/render, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /classify, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /v1/embeddings, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /score, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /v1/score, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /rerank, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /v1/rerank, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /v2/rerank, Methods: POST
(APIServer pid=15548) INFO 04-20 15:33:23 [launcher.py:46] Route: /pooling, Methods: POST
(APIServer pid=15548) INFO:     Started server process [15548]
(APIServer pid=15548) INFO:     Waiting for application startup.
(APIServer pid=15548) INFO:     Application startup complete.
(APIServer pid=15548) INFO:     127.0.0.1:23188 - "GET /v1/models HTTP/1.1" 200 OK
(APIServer pid=15548) INFO:     127.0.0.1:23190 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=15548) INFO:     127.0.0.1:23208 - "GET /v1/models HTTP/1.1" 200 OK

/v1/models response:

{"object":"list","data":[{"id":"Qwen2.5-72B-Instruct","object":"model","created":1776670403,"owned_by":"vllm","root":"/ssd1/models/Qwen2.5-72B-Instruct","parent":null,"max_model_len":132096,"permission":[{"id":"modelperm-a8472fbff8b26932","object":"model_permission","created":1776670403,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}

/v1/chat/completions response:

{"id":"chatcmpl-8da091a43392bcfc","object":"chat.completion","created":1776670403,"model":"Qwen2.5-72B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"验证正常","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":36,"total_tokens":39,"completion_tokens":3,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

Test plan

python -m pytest tests/ut/test_kunlun_platform.py -q
validate KunlunPlatform.check_and_update_config() normalizes legacy splitting_ops and fills missing attention split ops
start the Qwen2.5-72B-Instruct OpenAI-compatible server and verify both /v1/models and /v1/chat/completions

- normalize legacy vllm splitting_ops to vllm:: format for piecewise cudagraphs - append missing attention split ops for Kunlun graph configs - add regression coverage and update docs Signed-off-by: Lidang Jiang <lidangjiang@gmail.com>

Copilot

Pull request overview

This PR fixes KunlunGraph piecewise CUDA graph startup failures when users provide legacy vllm.xxx splitting op names by normalizing them to the vllm::xxx format and auto-completing required attention split ops, aligning KunlunGraph behavior with vLLM’s piecewise cudagraph expectations.

Changes:

Add splitting-op normalization + ordered de-duplication and auto-completion of required attention split ops for Kunlun piecewise cudagraph mode.
Add unit tests covering legacy splitting-op normalization, preservation/deduplication of custom ops, and non-piecewise behavior.
Update docs to discourage manually setting compilation_config.splitting_ops in normal usage and fix the CLI flag spelling for enforce eager mode.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File	Description
vllm_kunlun/platforms/kunlun.py	Normalizes legacy splitting op names and completes required attention split ops for piecewise cudagraphs.
tests/ut/test_kunlun_platform.py	Adds regression tests for the legacy/partial splitting-op config path.
docs/source/user_guide/feature_guide/graph_mode.md	Documents auto-selection of split ops and corrects `--enforce-eager` flag spelling.
docs/source/quick_start.md	Removes manual splitting-ops configuration from quickstart and documents that it’s not needed normally.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fix: normalize kunlun graph splitting ops

8ef277f

- normalize legacy vllm splitting_ops to vllm:: format for piecewise cudagraphs - append missing attention split ops for Kunlun graph configs - add regression coverage and update docs Signed-off-by: Lidang Jiang <lidangjiang@gmail.com>

Lidang-Jiang force-pushed the fix/issue-311-kunlun-graph-splitting-ops branch from c3bbb9f to 8ef277f Compare April 20, 2026 07:40

xyDong0223 requested a review from Copilot April 21, 2026 05:59

Copilot started reviewing on behalf of xyDong0223 April 21, 2026 05:59 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Normalize KunlunGraph splitting_ops for piecewise cudagraph#329

[Bugfix] Normalize KunlunGraph splitting_ops for piecewise cudagraph#329
Lidang-Jiang wants to merge 1 commit into
baidu:mainfrom
Lidang-Jiang:fix/issue-311-kunlun-graph-splitting-ops

Lidang-Jiang commented Apr 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Lidang-Jiang commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Description

Checklist (Required)

Summary

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Lidang-Jiang commented Apr 20, 2026 •

edited

Loading