Skip to content

Fix/t2v routing video support#34

Open
chinsengi wants to merge 11 commits intozhaochenyang20:mainfrom
chinsengi:fix/t2v-routing-video-support
Open

Fix/t2v routing video support#34
chinsengi wants to merge 11 commits intozhaochenyang20:mainfrom
chinsengi:fix/t2v-routing-video-support

Conversation

@chinsengi
Copy link
Copy Markdown

@chinsengi chinsengi commented Feb 24, 2026

Motivation

When routing image generation to T2V worker with Wan-AI/Wan2.1-T2V-1.3B-Diffusers , the worker will go through video diffusion process and eventually error.

Modifications

  1. Only route image generation to workers that are T2I.
  2. Update readme and make sure all example runs
  3. bump the pinned sglang version so that upstream update_weight_from_disk api is enabled

Benchmarking and Profiling

Expected outputs are updated in readme

Checklist

  • Format your code with pre-commit run --all-files.
  • Add or update unit tests if applicable.
  • Update documentation if applicable.
  • Provide accuracy and performance benchmark results if applicable.
  • Ensure all CI checks pass.

Review Process

  1. Get approvals from maintainers and other reviewers.
  2. Ensure all CI tests pass.
  3. After green CI and required approvals, ask maintainers to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @chinsengi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the routing mechanism for diffusion models by intelligently directing image generation requests only to workers capable of handling them, thereby avoiding errors with video-specific workers. It also updates documentation and dependencies to support these changes and improve development practices.

Highlights

  • Image Generation Routing Logic: Implemented logic to ensure image generation requests are only routed to workers that do not support video generation, preventing errors when T2V workers receive image tasks.
  • README Updates: Updated the README to reflect the new T2V worker configuration and included an example for video generation requests, alongside clarifications for API responses.
  • Dependency Updates: Bumped the pinned sglang version to enable the update_weight_from_disk API and added pytest and pre-commit to project dependencies.
  • Worker Health Endpoint Enhancement: The /health_workers endpoint now includes a video_support field for each worker, indicating its capability.
  • New Unit Tests: Added comprehensive unit tests to validate the new image generation routing behavior, including scenarios with mixed worker types and when only video workers are available.
Changelog
  • README.md
    • Updated the model path for worker 2 to Wan-AI/Wan2.1-T2V-1.3B-Diffusers.
    • Added comments to example API responses for clarity.
    • Removed the local image decoding and saving steps from the image generation example.
    • Introduced a new example for video generation requests.
  • pyproject.toml
    • Added pytest>=9.0.2 to the project dependencies.
    • Added pre-commit>=4.5.1 to the project dependencies.
  • sglang
    • Updated the sglang subproject commit hash from 45095bac70ef1382425cb86f4b7af66dc6e7641c to 2274bfebb194f83ac1f2b69e100a27c4a27d94d2.
  • src/sglang_diffusion_routing/router/diffusion_router.py
    • Modified the generate method to filter workers, routing image generation requests only to those without video support.
    • Added a check in generate to return a 400 error if no image-capable workers are available.
    • Included the video_support status in the response of the health_workers endpoint.
  • tests/unit/test_router_endpoints.py
    • Added test_generate_routes_only_to_image_workers to verify correct routing to image-only workers.
    • Added test_generate_includes_unknown_workers to ensure workers with unknown video support are considered image-capable.
    • Added test_generate_returns_400_when_only_video_workers to test error handling when no image workers are present.
Activity
  • The author has formatted the code with pre-commit run --all-files.
  • The author has added or updated unit tests.
  • The author has updated documentation.
  • The author has provided expected outputs in the readme for benchmarking.
  • The pull request is awaiting approvals from maintainers and other reviewers.
  • All CI checks need to pass before merging.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements to the diffusion router, enabling proper routing of image generation requests to image-only workers and adding support for video generation. The changes include updating the README.md with new examples for video generation and clarifying image generation behavior, updating dependencies in pyproject.toml, and bumping the sglang submodule. Crucially, the diffusion_router.py now intelligently filters workers based on their video support for image generation tasks, and new unit tests ensure the correctness of this routing logic. The addition of video_support to worker health information also enhances observability.

Comment thread README.md
Comment thread README.md
Comment thread pyproject.toml Outdated
candidate_workers = [
worker_url
for worker_url, support in self.worker_video_support.items()
if support is not None and not support
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary, if support is enough

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because support cannot be None? what's the logic here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, was looking at another function. The code here should work.

But I do feel like the logic here seems confusing. If the intent is to find image-capable workers, using worker_video_support and checking not support is an implicit approach.

If later we have some othe task type supported, it would cause issue.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at how video support is determined, it is determined by checking that the task type is not in _IMAGE_TASK_TYPES. See method _probe_worker_video_support(). So it is unclear to me what is the plan down the road, and how we should differentiate between video and image diffusion models.

Another thing is that sglang exposes /v1/image/generation and /v1/video/generate regardless of the model type, and in the particular case of WAN1.3B, I believe it can generate images by setting the num of frames to 1. Is this something we want to support?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing is that sglang exposes /v1/image/generation and /v1/video/generate regardless of the model type, and in the particular case of WAN1.3B, I believe it can generate images by setting the num of frames to 1. Is this something we want to support?

I don't think that's something we want to support, we should differentiate this by the API supported by model. if the model doesn't accept the image generation payload, then we shouldn't treat it as a image generation capaable model.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at how video support is determined, it is determined by checking that the task type is not in _IMAGE_TASK_TYPES. See method _probe_worker_video_support(). So it is unclear to me what is the plan down the road, and how we should differentiate between video and image diffusion models.

I think we want to refactor this, this indirect check may it hard to understand the code.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored to use task_types instead of video_support

@alphabetc1 alphabetc1 self-assigned this Feb 25, 2026
Comment thread src/sglang_diffusion_routing/router/diffusion_router.py Outdated
Comment thread pyproject.toml Outdated
Comment thread sglang
Comment thread README.md
"model": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
"prompt": "a flowing river",
})
# {'id': '8286716d-7ef9-43ce-a3af-ce443543d221', 'object': 'video', 'model': 'Wan-AI/Wan2.1-T2V-1.3B-Diffusers', 'status': 'queued', 'progress': 0, 'created_at': 1771877888, 'size': '832x480', 'seconds': '6', 'quality': 'standard', 'url': None, 'remixed_from_video_id': None, 'completed_at': None, 'expires_at': None, 'error': None, 'file_path': './sglang-diffusion-routing/outputs/8286716d-7ef9-43ce-a3af-ce443543d221.mp4', 'peak_memory_mb': None, 'inference_time_s': None}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clean up those commented test payload if they are not necessary.

# True: supports, False: image-only, None: unknown/unprobed
self.worker_video_support: dict[str, bool | None] = {}
# URL -> task_type string (e.g. "T2I", "T2V"), or None if unprobed
self.worker_task_type: dict[str, str | None] = {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible a single worker support multiple task type ?

Comment on lines +587 to +593
worker_url
for worker_url, task_type in self.worker_task_type.items()
if task_type is not None
and task_type.upper() in _VIDEO_TASK_TYPES
and worker_url in self.worker_request_counts
and worker_url not in self.dead_workers
and worker_url not in self.sleeping_workers
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify this logic, is it a bit over-defensing ?

Comment on lines 20 to +21
_IMAGE_TASK_TYPES = {"T2I", "I2I", "TI2I"}
_VIDEO_TASK_TYPES = {"T2V"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There' re different task type here, can we clarify what's the mapping relation here ?

Is it 1 model -> 1 task type or 1 mode -> multi task type ?

Should we unify this by task type rather than image / video ?

@zhaochenyang20
Copy link
Copy Markdown
Owner

Hey @dreamyang-liu @alphabetc1 ,

Shirui and I discussed the routing design and we think the router should not be responsible for distinguishing between different workload types (e.g., text-to-image vs. image-to-image). A better design is:

Each router group should only serve one type of model. Different model types (e.g., a diffusion model for rollout, a VLM for reward) should each have their own independent router. This way, work type becomes a router-level attribute, rather than something the router needs to figure out internally.

Specifically:

  • If someone tries to add a worker of a mismatched type to a router, it should raise an error directly.
  • Scheduling across different model types should be handled by deploying multiple routers.

This keeps the router logic much simpler and aligns well with SGLang's existing DP/DPA scheduling approach.

Let me know if you have any questions!

@dreamyang-liu
Copy link
Copy Markdown
Contributor

Hey @dreamyang-liu @alphabetc1 ,

Shirui and I discussed the routing design and we think the router should not be responsible for distinguishing between different workload types (e.g., text-to-image vs. image-to-image). A better design is:

Each router group should only serve one type of model. Different model types (e.g., a diffusion model for rollout, a VLM for reward) should each have their own independent router. This way, work type becomes a router-level attribute, rather than something the router needs to figure out internally.

Specifically:

* If someone tries to add a worker of a mismatched type to a router, it should raise an error directly.

* Scheduling across different model types should be handled by deploying multiple routers.

This keeps the router logic much simpler and aligns well with SGLang's existing DP/DPA scheduling approach.

Let me know if you have any questions!

@zhaochenyang20
In this case, I feel like it's more like a load balancer, if we think if this is the path we want, we need to clearly document it in the README and we need to have strict enforcement when launching the router. Also, I'm curious when multiple worker with different model (same workload type), what would be the expected behavior or we want to ban this scenario. I think there's still some ambiguity require clarification.

@alphabetc1
Copy link
Copy Markdown
Collaborator

Hey @dreamyang-liu @alphabetc1 ,

Shirui and I discussed the routing design and we think the router should not be responsible for distinguishing between different workload types (e.g., text-to-image vs. image-to-image). A better design is:

Each router group should only serve one type of model. Different model types (e.g., a diffusion model for rollout, a VLM for reward) should each have their own independent router. This way, work type becomes a router-level attribute, rather than something the router needs to figure out internally.

Specifically:

  • If someone tries to add a worker of a mismatched type to a router, it should raise an error directly.
  • Scheduling across different model types should be handled by deploying multiple routers.

This keeps the router logic much simpler and aligns well with SGLang's existing DP/DPA scheduling approach.

Let me know if you have any questions!

The single-type-per-router approach is cleaner internally but shifts complexity to users — they'd need to manage multiple router addresses and figure out which one to hit. For just image + video, the current unified router design is pragmatic and works.
For now, I'd recommend keeping the current design and hardening the probe logic (retries, manual type override as fallback) rather than a full restructure.

If we ever need to split later, we can introduce a thin gateway in front of the single-type routers to keep a single-endpoint UX.

@chinsengi
Copy link
Copy Markdown
Author

@zhaochenyang20 @dreamyang-liu @alphabetc1 So what is the final verdict here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants