bug: Failed to import Triton kernels. Please make sure your triton version is compatible. Error: No module named 'triton.language.target_info'

### Bug Description

按照文档步骤安装依赖后，按照改启动参数，模型启动报错，缺失对应triton的模块，启动日志报错如下。
ERROR 04-09 19:09:35 [config.py:33] Failed to import Triton kernels. Please make sure your triton version is compatible. Error: No module named 'triton.language.target_info'
INFO 04-09 19:09:35 [parallel_state.py:1212] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:55251 backend=nccl
INFO 04-09 19:09:35 [parallel_state.py:1423] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A
ERROR 04-09 19:09:36 [gpt_oss_triton_kernels_moe.py:34] Failed to import Triton kernels. Please make sure your triton version is compatible. Error: No module named 'triton.language.target_info'
INFO 04-09 19:09:36 [topk_topp_sampler.py:26] Using FlashInfer for top-p & top-k sampling.
ERROR 04-09 19:09:36 [multiproc_executor.py:772] WorkerProc failed to start.
ERROR 04-09 19:09:36 [multiproc_executor.py:772] Traceback (most recent call last):
ERROR 04-09 19:09:36 [multiproc_executor.py:772]   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 743, in worker_main
ERROR 04-09 19:09:36 [multiproc_executor.py:772]     worker = WorkerProc(*args, **kwargs)
ERROR 04-09 19:09:36 [multiproc_executor.py:772]   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 569, in __init__
ERROR 04-09 19:09:36 [multiproc_executor.py:772]     self.worker.init_device()
ERROR 04-09 19:09:36 [multiproc_executor.py:772]   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/worker/worker_base.py", line 326, in init_device
ERROR 04-09 19:09:36 [multiproc_executor.py:772]     self.worker.init_device()  # type: ignore
ERROR 04-09 19:09:36 [multiproc_executor.py:772]   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 262, in init_device
ERROR 04-09 19:09:36 [multiproc_executor.py:772]     self.model_runner = GPUModelRunnerV1(self.vllm_config, self.device)
ERROR 04-09 19:09:36 [multiproc_executor.py:772]   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 644, in __init__
ERROR 04-09 19:09:36 [multiproc_executor.py:772]     self.cudagraph_dispatcher = CudagraphDispatcher(self.vllm_config)
ERROR 04-09 19:09:36 [multiproc_executor.py:772]   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/cudagraph_dispatcher.py", line 46, in __init__
ERROR 04-09 19:09:36 [multiproc_executor.py:772]     assert (
ERROR 04-09 19:09:36 [multiproc_executor.py:772] AssertionError: Compilation mode should be CompilationMode.VLLM_COMPILE when cudagraph_mode piecewise cudagraphs is used, and attention should be in splitting_ops or inductor splitting should be used. cudagraph_mode=FULL_AND_PIECEWISE, compilation_mode=3, splitting_ops=['vllm.unified_attention_with_output_kunlun']
INFO 04-09 19:09:36 [multiproc_executor.py:730] Parent process exited, terminating worker
[rank0]:[W409 19:09:37.415115442 ProcessGroupXCCL.cpp:1163] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] EngineCore failed to start.
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] Traceback (most recent call last):
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946]   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946]   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 691, in __init__
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946]     super().__init__(
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946]   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 105, in __init__
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946]   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 97, in __init__
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946]     super().__init__(vllm_config)
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946]   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946]     self._init_executor()
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946]   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 165, in _init_executor
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946]     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946]   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 678, in wait_for_ready
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946]     raise e from None
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(EngineCore_DP0 pid=2380410) Process EngineCore_DP0:
(EngineCore_DP0 pid=2380410) Traceback (most recent call last):
(EngineCore_DP0 pid=2380410)   File "/root/.local/share/uv/python/cpython-3.10.19-linux-x86_64-gnu/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=2380410)     self.run()
(EngineCore_DP0 pid=2380410)   File "/root/.local/share/uv/python/cpython-3.10.19-linux-x86_64-gnu/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=2380410)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=2380410)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 950, in run_engine_core
(EngineCore_DP0 pid=2380410)     raise e
(EngineCore_DP0 pid=2380410)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=2380410)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=2380410)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 691, in __init__
(EngineCore_DP0 pid=2380410)     super().__init__(
(EngineCore_DP0 pid=2380410)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 105, in __init__
(EngineCore_DP0 pid=2380410)     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=2380410)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 97, in __init__
(EngineCore_DP0 pid=2380410)     super().__init__(vllm_config)
(EngineCore_DP0 pid=2380410)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=2380410)     self._init_executor()
(EngineCore_DP0 pid=2380410)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 165, in _init_executor
(EngineCore_DP0 pid=2380410)     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=2380410)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 678, in wait_for_ready
(EngineCore_DP0 pid=2380410)     raise e from None
(EngineCore_DP0 pid=2380410) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(APIServer pid=2380226) Traceback (most recent call last):
(APIServer pid=2380226)   File "/root/.local/share/uv/python/cpython-3.10.19-linux-x86_64-gnu/lib/python3.10/runpy.py", line 196, in _run_module_as_main
(APIServer pid=2380226)     return _run_code(code, main_globals, None,
(APIServer pid=2380226)   File "/root/.local/share/uv/python/cpython-3.10.19-linux-x86_64-gnu/lib/python3.10/runpy.py", line 86, in _run_code
(APIServer pid=2380226)     exec(code, run_globals)
(APIServer pid=2380226)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 991, in <module>
(APIServer pid=2380226)     uvloop.run(run_server(args))
(APIServer pid=2380226)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run
(APIServer pid=2380226)     return loop.run_until_complete(wrapper())
(APIServer pid=2380226)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=2380226)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=2380226)     return await main
(APIServer pid=2380226)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 919, in run_server
(APIServer pid=2380226)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=2380226)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 938, in run_server_worker
(APIServer pid=2380226)     async with build_async_engine_client(
(APIServer pid=2380226)   File "/root/.local/share/uv/python/cpython-3.10.19-linux-x86_64-gnu/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=2380226)     return await anext(self.gen)
(APIServer pid=2380226)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 147, in build_async_engine_client
(APIServer pid=2380226)     async with build_async_engine_client_from_engine_args(
(APIServer pid=2380226)   File "/root/.local/share/uv/python/cpython-3.10.19-linux-x86_64-gnu/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=2380226)     return await anext(self.gen)
(APIServer pid=2380226)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 188, in build_async_engine_client_from_engine_args
(APIServer pid=2380226)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=2380226)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 228, in from_vllm_config
(APIServer pid=2380226)     return cls(
(APIServer pid=2380226)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 155, in __init__
(APIServer pid=2380226)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=2380226)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 122, in make_async_mp_client
(APIServer pid=2380226)     return AsyncMPClient(*client_args)
(APIServer pid=2380226)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 819, in __init__
(APIServer pid=2380226)     super().__init__(
(APIServer pid=2380226)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 479, in __init__
(APIServer pid=2380226)     with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=2380226)   File "/root/.local/share/uv/python/cpython-3.10.19-linux-x86_64-gnu/lib/python3.10/contextlib.py", line 142, in __exit__
(APIServer pid=2380226)     next(self.gen)
(APIServer pid=2380226)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 933, in launch_core_engines
(APIServer pid=2380226)     wait_for_engine_startup(
(APIServer pid=2380226)   File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 992, in wait_for_engine_startup
(APIServer pid=2380226)     raise RuntimeError(
(APIServer pid=2380226) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}


### Steps to Reproduce

环境版本：
```
uv pip list | grep -E "vllm|triton"
Using Python 3.10.19 environment at: /opt/vllm_kunlun
triton                            3.1.0
vllm                              0.15.1
vllm-kunlun                       0.15.1.dev0
```
启动命令：
```
python -m vllm.entrypoints.openai.api_server \
      --host localhost \
      --port 8807 \
      --model /data/models/Qwen3-0.6B \
      --served-model-name Qwen3-0.6B \
      --gpu-memory-utilization 0.95 \
      --trust-remote-code \
      --max-model-len 32768 \
      --tensor-parallel-size 1 \
      --dtype float16 \
      --max_num_seqs 128 \
      --max_num_batched_tokens 32768 \
      --block-size 128 \
      --no-enable-prefix-caching \
      --no-enable-chunked-prefill \
      --distributed-executor-backend mp \
      --compilation-config '{"splitting_ops": ["vllm.unified_attention_with_output_kunlun"]}'
```

### Expected Behavior

_No response_

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: Failed to import Triton kernels. Please make sure your triton version is compatible. Error: No module named 'triton.language.target_info' #311

Bug Description

Steps to Reproduce

Expected Behavior

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

bug: Failed to import Triton kernels. Please make sure your triton version is compatible. Error: No module named 'triton.language.target_info' #311

Description

Bug Description

Steps to Reproduce

Expected Behavior

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions