[Feature] Support Wan2.2 T2V and I2V Online Serving with OpenAI /v1/videos API#1073
[Feature] Support Wan2.2 T2V and I2V Online Serving with OpenAI /v1/videos API#1073SamitHuang wants to merge 15 commits intovllm-project:mainfrom
Conversation
Introduce a video generations API with an extensible request schema and shared diffusion routing so Wan2.2 and future video models can be served consistently. Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b170ef4fc5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: samithuang <285365963@qq.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4a5b024bd5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if video_tensor.is_floating_point(): | ||
| video_tensor = video_tensor.clamp(-1, 1) * 0.5 + 0.5 | ||
| video_array = video_tensor.float().numpy() | ||
| return _normalize_single_video_array(video_array) |
There was a problem hiding this comment.
Normalize uint8 tensors before float cast
If a model returns video frames as a uint8 torch tensor (0–255), _normalize_video_tensor casts to float before calling _normalize_single_video_array. That skips the integer-scaling path and instead clamps values to [-1, 1], turning most pixels into 1.0 (washed‑out/white frames). Handle integer tensors before the float cast (e.g., scale by 255 or preserve dtype) so post‑processed uint8 outputs encode correctly.
Useful? React with 👍 / 👎.
|
Does this PR support text-to-video and use the same endpoint? |
it supports other T2V models with the same geneneration endpoint |
Sorry, image-to-video does this pr support ed? |
not currently |
|
should also support /v1/videos? https://platform.openai.com/docs/api-reference/videos/create , which is multipart/form-data |
Signed-off-by: samithuang <285365963@qq.com>
I think we should follow this openai api endpoint @SamitHuang WDYT |
Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: Samit <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
agree, i have updated accordingly |
Signed-off-by: samithuang <285365963@qq.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
New APIs
POST
/v1/videosOpenAI-style video generation endpoint.
Main Logic
The handler maps request fields to
OmniDiffusionSamplingParams, routes tothe correct execution backend, extracts the video output, and encodes MP4
to base64.
When a t2v or i2v request arrives:
input_reference(reference image) and attach it tomulti_modal_data.image.OmniDiffusionSamplingParams.AsyncOmniorAsyncOmniDiffusiondepending on server configuration.sampling_params_listaligned with stage types.OmniRequestOutput.diffusers.utils.export_to_video.VideoGenerationResponsewithb64_json.Main Changes
vllm_omni/entrypoints/openai/protocol/videos.pyvllm_omni/entrypoints/openai/video_api_utils.pyvllm_omni/entrypoints/openai/serving_video.pyvllm_omni/entrypoints/openai/api_server.pyvllm_omni/entrypoints/async_omni.pyexamples/online_serving/text_to_video/run_curl_text_to_video.shexamples/online_serving/image_to_video/run_curl_image_to_video.shTest Plan
T2V
Lauch the server
Send request via curl
I2V
Launch the server
Send request via curl (multipart)
Test Result
T2V
wan22_output.mp4
I2V
wan22_i2v_output.mp4
Future consideration
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)