-
Notifications
You must be signed in to change notification settings - Fork 707
Open
Labels
Description
I tried to enable Flashinfer TRTLLM MoE for DeepSeek-V2-Lite-Instruct-FP8 in vLLM and ran into the following error:
[...]
File "/code/vllm/model_executor/layers/fused_moe/flashinfer_trtllm_moe.py", line 308, in fi_trtllm_fp8_per_tensor_moe
return flashinfer_trtllm_fp8_per_tensor_scale_moe(
File "/code/vllm/utils/flashinfer.py", line 102, in wrapper
return impl(*args, **kwargs)
File "/code/.venv/lib/python3.10/site-packages/flashinfer/fused_moe/core.py", line 2258, in trtllm_fp8_per_tensor_scale_moe
return get_trtllm_moe_sm100_module().trtllm_fp8_per_tensor_scale_moe(
File "/code/.venv/lib/python3.10/site-packages/flashinfer/fused_moe/core.py", line 1485, in trtllm_fp8_per_tensor_scale_moe_op
activation_type=activation_type.value,
AttributeError: 'int' object has no attribute 'value'
Going down the stacktrace entry by entry:
- In vLLM,
flashinfer_trtllm_fp8_per_tensor_scale_moeis called withoutactivation_type - This corresponds to
flashinfer.fused_moe.core.trtllm_fp8_per_tensor_scale_moe, which has an argumentactivation_type: int = ActivationType.Swiglu.value(int type) - That gets passed to
get_trtllm_moe_sm100_module().trtllm_fp8_per_tensor_scale_moe(), which however has an input argumentactivation_type: ActivationType = ActivationType.Swiglu(enum type) - In line 1485,
activation_type.valueis used and raises exception, as it is an int, while it should have been an enum
The respective test in test_trtllm_gen_fused_moe.py calls trtllm_fp8_per_tensor_scale_moe with the enum and not int, so this wasn't caught:
# tests/moe/test_trtllm_gen_fused_moe.py:2855
@pytest.mark.parametrize(
"activation_type",
[
pytest.param(ActivationType.Swiglu, id="Swiglu"),
pytest.param(ActivationType.Geglu, id="Geglu"),
pytest.param(ActivationType.Relu2, id="Relu2"),
],
)
It's weird that mypy didn't catch this, as all involved functions are annotated and even the test calls the function with the wrong type.
This seems to originate in the recent #2462.
Reactions are currently unavailable