Skip to content

[Bug] MTP speculative decoding fails for Qwen3.5-35B-A3B-NVFP4 with jetson latest vllm docker #362

@sc-hua

Description

@sc-hua

Enable MTP speculative decoding:

`speculative-config: '{"method":"qwen3_next_mtp","num_speculative_tokens":2}'`

When enabling MTP (Multi-Token Prediction) speculative decoding for Qwen3_5MoeForConditionalGeneration, the engine fails during drafter model weight loading with:

File ".../vllm/model_executor/models/qwen3_5_mtp.py", line 439, in load_weights
    return loader.load_weights(remap_weight_names(weights))

File ".../vllm/model_executor/models/utils.py", line 328, in _load_module
    raise ValueError(msg)

ValueError: There is no module or parameter named 'language_model' in Qwen3_5MoeMTP.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions