Open
Description
Is your feature request related to a problem? Please describe.
We have been using deepspeed.init_inference
API for speeding up inference for text only models (e.g. mistral, qwen 2.5 series) with success. Was hoping we can extend support for vision language models as well, e.g. qwen 2 vl, etc, which is currently not supported.
Describe the solution you'd like
deepspeed.init_inference
to work for vision language models (for both embedding use case as well as generation use case)- and also make extending with our own model's tutorial clearer/cleaner.
Describe alternatives you've considered
N/A
Additional context
N/A