Summary
Users can crash the vLLM engine serving multimodal models by passing multimodal embedding inputs with correct ndim but incorrect shape (e.g. hidden dimension is wrong), regardless of whether the model is intended to support such inputs (as defined in the Supported Models page).
The issue has existed ever since we added support for image embedding inputs, i.e. #6613 (released in v0.5.5)
Details
Using image embeddings as an example:
For models that support image embedding inputs, the engine crashes when scattering the embeddings to inputs_embeds (mismatched shape)
For models that don't support image embedding inputs, the engine crashes when validating the inputs inside get_input_embeddings (validation fails).
This happens because we only validate ndim of the tensor, but not the full shape, in input processor (via MultiModalDataParser).
Impact
Denial of service by crashing the engine.
The resolution documented on that CVE didn’t fix the root cause but only added a flag to disable/enable prompt embeds, so by default, prompt embeds feature is disabled in vLLM, which stops DoS attacks through the embeddings. However, it doesn’t address the problem when the flag is enabled and there is still potential for DoS attacks.
Fixes
Summary
Users can crash the vLLM engine serving multimodal models by passing multimodal embedding inputs with correct
ndimbut incorrectshape(e.g. hidden dimension is wrong), regardless of whether the model is intended to support such inputs (as defined in the Supported Models page).The issue has existed ever since we added support for image embedding inputs, i.e. #6613 (released in v0.5.5)
Details
Using image embeddings as an example:
For models that support image embedding inputs, the engine crashes when scattering the embeddings to
inputs_embeds(mismatched shape)For models that don't support image embedding inputs, the engine crashes when validating the inputs inside
get_input_embeddings(validation fails).This happens because we only validate
ndimof the tensor, but not the full shape, in input processor (viaMultiModalDataParser).Impact
Denial of service by crashing the engine.
The resolution documented on that CVE didn’t fix the root cause but only added a flag to disable/enable prompt embeds, so by default, prompt embeds feature is disabled in vLLM, which stops DoS attacks through the embeddings. However, it doesn’t address the problem when the flag is enabled and there is still potential for DoS attacks.
Fixes