Identical responses returned for similar-but-different input images

I'm using vllm-mlx on macOS (MBP M5 Max, 128GB, Tahoe 26.3.1 (a)) to host `mlx-community/Qwen3-VL-30B-A3B-Instruct-bf16`. Overall it works great, but I'm running into one annoying issue.

If I make sequential calls via the OpenAI API to describe a series of images one at a time, and several of the images are similar (but not the same), I find I get back identical text descriptions for multiple images in a row. The images are sufficiently different that the repeated returned description is wrong in at least some regard for the second and subsequent similar images, but they are word-for-word the same description as the first similar one in the sequence.

Is there some caching going on for similar-but-not-the-same images, which is causing a previously-generated similar-image input to be re-used for later queries (or something)? If so, is there a way to opt out of it?

For more context, I'm sending requests to `http://localhost:8000/v1/chat/completions`, with a query that includes a text message and a single image message containing a base64-encoded JPEG (max 2048px in each dimension).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identical responses returned for similar-but-different input images #197

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Identical responses returned for similar-but-different input images #197

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions