Skip to content

[REQUEST] Deepspeed Inference Supports VL (vision language) model #6917

Open
@ethen8181

Description

@ethen8181

Is your feature request related to a problem? Please describe.
We have been using deepspeed.init_inference API for speeding up inference for text only models (e.g. mistral, qwen 2.5 series) with success. Was hoping we can extend support for vision language models as well, e.g. qwen 2 vl, etc, which is currently not supported.

Describe the solution you'd like

  • deepspeed.init_inference to work for vision language models (for both embedding use case as well as generation use case)
  • and also make extending with our own model's tutorial clearer/cleaner.

Describe alternatives you've considered
N/A

Additional context
N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions