Skip to content

Identical responses returned for similar-but-different input images #197

@BelieveDiffusion

Description

@BelieveDiffusion

I'm using vllm-mlx on macOS (MBP M5 Max, 128GB, Tahoe 26.3.1 (a)) to host mlx-community/Qwen3-VL-30B-A3B-Instruct-bf16. Overall it works great, but I'm running into one annoying issue.

If I make sequential calls via the OpenAI API to describe a series of images one at a time, and several of the images are similar (but not the same), I find I get back identical text descriptions for multiple images in a row. The images are sufficiently different that the repeated returned description is wrong in at least some regard for the second and subsequent similar images, but they are word-for-word the same description as the first similar one in the sequence.

Is there some caching going on for similar-but-not-the-same images, which is causing a previously-generated similar-image input to be re-used for later queries (or something)? If so, is there a way to opt out of it?

For more context, I'm sending requests to http://localhost:8000/v1/chat/completions, with a query that includes a text message and a single image message containing a base64-encoded JPEG (max 2048px in each dimension).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions