Skip to content

bug: Failed to import Triton kernels. Please make sure your triton version is compatible. Error: No module named 'triton.language.target_info' #311

Description

@dongpeiyu

Bug Description

按照文档步骤安装依赖后,按照改启动参数,模型启动报错,缺失对应triton的模块,启动日志报错如下。
ERROR 04-09 19:09:35 [config.py:33] Failed to import Triton kernels. Please make sure your triton version is compatible. Error: No module named 'triton.language.target_info'
INFO 04-09 19:09:35 [parallel_state.py:1212] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:55251 backend=nccl
INFO 04-09 19:09:35 [parallel_state.py:1423] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A
ERROR 04-09 19:09:36 [gpt_oss_triton_kernels_moe.py:34] Failed to import Triton kernels. Please make sure your triton version is compatible. Error: No module named 'triton.language.target_info'
INFO 04-09 19:09:36 [topk_topp_sampler.py:26] Using FlashInfer for top-p & top-k sampling.
ERROR 04-09 19:09:36 [multiproc_executor.py:772] WorkerProc failed to start.
ERROR 04-09 19:09:36 [multiproc_executor.py:772] Traceback (most recent call last):
ERROR 04-09 19:09:36 [multiproc_executor.py:772] File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 743, in worker_main
ERROR 04-09 19:09:36 [multiproc_executor.py:772] worker = WorkerProc(*args, **kwargs)
ERROR 04-09 19:09:36 [multiproc_executor.py:772] File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 569, in init
ERROR 04-09 19:09:36 [multiproc_executor.py:772] self.worker.init_device()
ERROR 04-09 19:09:36 [multiproc_executor.py:772] File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/worker/worker_base.py", line 326, in init_device
ERROR 04-09 19:09:36 [multiproc_executor.py:772] self.worker.init_device() # type: ignore
ERROR 04-09 19:09:36 [multiproc_executor.py:772] File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 262, in init_device
ERROR 04-09 19:09:36 [multiproc_executor.py:772] self.model_runner = GPUModelRunnerV1(self.vllm_config, self.device)
ERROR 04-09 19:09:36 [multiproc_executor.py:772] File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 644, in init
ERROR 04-09 19:09:36 [multiproc_executor.py:772] self.cudagraph_dispatcher = CudagraphDispatcher(self.vllm_config)
ERROR 04-09 19:09:36 [multiproc_executor.py:772] File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/cudagraph_dispatcher.py", line 46, in init
ERROR 04-09 19:09:36 [multiproc_executor.py:772] assert (
ERROR 04-09 19:09:36 [multiproc_executor.py:772] AssertionError: Compilation mode should be CompilationMode.VLLM_COMPILE when cudagraph_mode piecewise cudagraphs is used, and attention should be in splitting_ops or inductor splitting should be used. cudagraph_mode=FULL_AND_PIECEWISE, compilation_mode=3, splitting_ops=['vllm.unified_attention_with_output_kunlun']
INFO 04-09 19:09:36 [multiproc_executor.py:730] Parent process exited, terminating worker
[rank0]:[W409 19:09:37.415115442 ProcessGroupXCCL.cpp:1163] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] EngineCore failed to start.
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] Traceback (most recent call last):
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 691, in init
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] super().init(
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 105, in init
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 97, in init
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] super().init(vllm_config)
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 101, in init
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] self._init_executor()
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 165, in _init_executor
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 678, in wait_for_ready
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] raise e from None
(EngineCore_DP0 pid=2380410) ERROR 04-09 19:09:37 [core.py:946] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(EngineCore_DP0 pid=2380410) Process EngineCore_DP0:
(EngineCore_DP0 pid=2380410) Traceback (most recent call last):
(EngineCore_DP0 pid=2380410) File "/root/.local/share/uv/python/cpython-3.10.19-linux-x86_64-gnu/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=2380410) self.run()
(EngineCore_DP0 pid=2380410) File "/root/.local/share/uv/python/cpython-3.10.19-linux-x86_64-gnu/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=2380410) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=2380410) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 950, in run_engine_core
(EngineCore_DP0 pid=2380410) raise e
(EngineCore_DP0 pid=2380410) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 937, in run_engine_core
(EngineCore_DP0 pid=2380410) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=2380410) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 691, in init
(EngineCore_DP0 pid=2380410) super().init(
(EngineCore_DP0 pid=2380410) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 105, in init
(EngineCore_DP0 pid=2380410) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=2380410) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 97, in init
(EngineCore_DP0 pid=2380410) super().init(vllm_config)
(EngineCore_DP0 pid=2380410) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 101, in init
(EngineCore_DP0 pid=2380410) self._init_executor()
(EngineCore_DP0 pid=2380410) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 165, in _init_executor
(EngineCore_DP0 pid=2380410) self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=2380410) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 678, in wait_for_ready
(EngineCore_DP0 pid=2380410) raise e from None
(EngineCore_DP0 pid=2380410) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(APIServer pid=2380226) Traceback (most recent call last):
(APIServer pid=2380226) File "/root/.local/share/uv/python/cpython-3.10.19-linux-x86_64-gnu/lib/python3.10/runpy.py", line 196, in _run_module_as_main
(APIServer pid=2380226) return _run_code(code, main_globals, None,
(APIServer pid=2380226) File "/root/.local/share/uv/python/cpython-3.10.19-linux-x86_64-gnu/lib/python3.10/runpy.py", line 86, in _run_code
(APIServer pid=2380226) exec(code, run_globals)
(APIServer pid=2380226) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 991, in
(APIServer pid=2380226) uvloop.run(run_server(args))
(APIServer pid=2380226) File "/opt/vllm_kunlun/lib/python3.10/site-packages/uvloop/init.py", line 82, in run
(APIServer pid=2380226) return loop.run_until_complete(wrapper())
(APIServer pid=2380226) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=2380226) File "/opt/vllm_kunlun/lib/python3.10/site-packages/uvloop/init.py", line 61, in wrapper
(APIServer pid=2380226) return await main
(APIServer pid=2380226) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 919, in run_server
(APIServer pid=2380226) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=2380226) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 938, in run_server_worker
(APIServer pid=2380226) async with build_async_engine_client(
(APIServer pid=2380226) File "/root/.local/share/uv/python/cpython-3.10.19-linux-x86_64-gnu/lib/python3.10/contextlib.py", line 199, in aenter
(APIServer pid=2380226) return await anext(self.gen)
(APIServer pid=2380226) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 147, in build_async_engine_client
(APIServer pid=2380226) async with build_async_engine_client_from_engine_args(
(APIServer pid=2380226) File "/root/.local/share/uv/python/cpython-3.10.19-linux-x86_64-gnu/lib/python3.10/contextlib.py", line 199, in aenter
(APIServer pid=2380226) return await anext(self.gen)
(APIServer pid=2380226) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 188, in build_async_engine_client_from_engine_args
(APIServer pid=2380226) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=2380226) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 228, in from_vllm_config
(APIServer pid=2380226) return cls(
(APIServer pid=2380226) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 155, in init
(APIServer pid=2380226) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=2380226) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 122, in make_async_mp_client
(APIServer pid=2380226) return AsyncMPClient(*client_args)
(APIServer pid=2380226) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 819, in init
(APIServer pid=2380226) super().init(
(APIServer pid=2380226) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 479, in init
(APIServer pid=2380226) with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=2380226) File "/root/.local/share/uv/python/cpython-3.10.19-linux-x86_64-gnu/lib/python3.10/contextlib.py", line 142, in exit
(APIServer pid=2380226) next(self.gen)
(APIServer pid=2380226) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 933, in launch_core_engines
(APIServer pid=2380226) wait_for_engine_startup(
(APIServer pid=2380226) File "/opt/vllm_kunlun/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 992, in wait_for_engine_startup
(APIServer pid=2380226) raise RuntimeError(
(APIServer pid=2380226) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Steps to Reproduce

环境版本:

uv pip list | grep -E "vllm|triton"
Using Python 3.10.19 environment at: /opt/vllm_kunlun
triton                            3.1.0
vllm                              0.15.1
vllm-kunlun                       0.15.1.dev0

启动命令:

python -m vllm.entrypoints.openai.api_server \
      --host localhost \
      --port 8807 \
      --model /data/models/Qwen3-0.6B \
      --served-model-name Qwen3-0.6B \
      --gpu-memory-utilization 0.95 \
      --trust-remote-code \
      --max-model-len 32768 \
      --tensor-parallel-size 1 \
      --dtype float16 \
      --max_num_seqs 128 \
      --max_num_batched_tokens 32768 \
      --block-size 128 \
      --no-enable-prefix-caching \
      --no-enable-chunked-prefill \
      --distributed-executor-backend mp \
      --compilation-config '{"splitting_ops": ["vllm.unified_attention_with_output_kunlun"]}'

Expected Behavior

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions