vLLM: Denial of Service via Unbounded Frame Count in video/jpeg Base64 Processing

Summary

The VideoMediaIO.load_base64() method at vllm/multimodal/media/video.py:51-62 splits video/jpeg data URLs by comma to extract individual JPEG frames, but does not enforce a frame count limit. The num_frames parameter (default: 32), which is enforced by the load_bytes() code path at line 47-48, is completely bypassed in the video/jpeg base64 path. An attacker can send a single API request containing thousands of comma-separated base64-encoded JPEG frames, causing the server to decode all frames into memory and crash with OOM.

Details

Vulnerable code

# video.py:51-62
def load_base64(self, media_type: str, data: str) -> tuple[npt.NDArray, dict[str, Any]]:
    if media_type.lower() == "video/jpeg":
        load_frame = partial(self.image_io.load_base64, "image/jpeg")
        return np.stack(
            [np.asarray(load_frame(frame_data)) for frame_data in data.split(",")]
            #                                                       ^^^^^^^^^^
            # Unbounded split — no frame count limit
        ), {}
    return self.load_bytes(base64.b64decode(data))

The load_bytes() path (line 47-48) properly delegates to a video loader that respects self.num_frames (default 32). The load_base64("video/jpeg", ...) path bypasses this limit entirely — data.split(",") produces an unbounded list and every frame is decoded into a numpy array.

video/jpeg is part of vLLM's public API

video/jpeg is a vLLM-specific MIME type, not IANA-registered. However it is part of the public API surface:

encode_video_url() at vllm/multimodal/utils.py:96-108 generates data:video/jpeg;base64,... URLs
Official test suites at tests/entrypoints/openai/test_video.py:62 and tests/entrypoints/test_chat_utils.py:153 both use this format

Memory amplification

Each JPEG frame decodes to a full numpy array. For 640x480 RGB images, each frame is ~921 KB decoded. 5000 frames = ~4.6 GB. np.stack() then creates an additional copy. The compressed JPEG payload is small (~100 KB for 5000 frames) but decompresses to gigabytes.

Data flow

POST /v1/chat/completions
  → chat_utils.py:1434   video_url type → mm_parser.parse_video()
  → chat_utils.py:872    parse_video() → self._connector.fetch_video()
  → connector.py:295     fetch_video() → load_from_url(url, self.video_io)
  → connector.py:91      _load_data_url(): url_spec.path.split(",", 1)
                          → media_type = "video/jpeg"
                          → data = "<frame1>,<frame2>,...,<frame10000>"
  → connector.py:100     media_io.load_base64("video/jpeg", data)
  → video.py:54          data.split(",")  ← UNBOUNDED
  → video.py:55-57       all frames decoded into numpy arrays
  → video.py:56          np.stack([...])  ← massive combined array → OOM

connector.py:91 uses split(",", 1) which splits on only the first comma. All remaining commas stay in data and are later split by video.py:54.

Comparison with existing protections

Code Path	Frame Limit	File
`load_bytes()` (binary video)	Yes — `num_frames` (default 32)	video.py:46-49
`load_base64("video/jpeg", ...)`	No — unlimited `data.split(",")`	video.py:51-62

References

russellb published to vllm-project/vllm Apr 3, 2026

Published to the GitHub Advisory Database Apr 3, 2026

Reviewed Apr 3, 2026

Published by the National Vulnerability Database Apr 6, 2026

Last updated Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Package

Affected versions

Patched versions

Description

Summary

Details

Vulnerable code

video/jpeg is part of vLLM's public API

Memory amplification

Data flow

Comparison with existing protections

References

Severity

CVSS overall score

CVSS v3 base metrics

CVSS v3 base metrics

EPSS score

Exploit Prediction Scoring System (EPSS)

Weaknesses

Allocation of Resources Without Limits or Throttling

CVE ID

GHSA ID

Source code

Credits

Uh oh!