Fix/t2v routing video support by chinsengi · Pull Request #34 · zhaochenyang20/sglang-diffusion-routing

chinsengi · 2026-02-24T07:51:20Z

Motivation

When routing image generation to T2V worker with Wan-AI/Wan2.1-T2V-1.3B-Diffusers , the worker will go through video diffusion process and eventually error.

Modifications

Only route image generation to workers that are T2I.
Update readme and make sure all example runs
bump the pinned sglang version so that upstream update_weight_from_disk api is enabled

Benchmarking and Profiling

Expected outputs are updated in readme

Checklist

Format your code with pre-commit run --all-files.
Add or update unit tests if applicable.
Update documentation if applicable.
Provide accuracy and performance benchmark results if applicable.
Ensure all CI checks pass.

Review Process

Get approvals from maintainers and other reviewers.
Ensure all CI tests pass.
After green CI and required approvals, ask maintainers to merge.

- Bump sglang version to enable update_weights_from_disk API

…routing

gemini-code-assist · 2026-02-24T07:51:38Z

Summary of Changes

Hello @chinsengi, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the routing mechanism for diffusion models by intelligently directing image generation requests only to workers capable of handling them, thereby avoiding errors with video-specific workers. It also updates documentation and dependencies to support these changes and improve development practices.

Highlights

Image Generation Routing Logic: Implemented logic to ensure image generation requests are only routed to workers that do not support video generation, preventing errors when T2V workers receive image tasks.
README Updates: Updated the README to reflect the new T2V worker configuration and included an example for video generation requests, alongside clarifications for API responses.
Dependency Updates: Bumped the pinned sglang version to enable the update_weight_from_disk API and added pytest and pre-commit to project dependencies.
Worker Health Endpoint Enhancement: The /health_workers endpoint now includes a video_support field for each worker, indicating its capability.
New Unit Tests: Added comprehensive unit tests to validate the new image generation routing behavior, including scenarios with mixed worker types and when only video workers are available.

Changelog

README.md
- Updated the model path for worker 2 to Wan-AI/Wan2.1-T2V-1.3B-Diffusers.
- Added comments to example API responses for clarity.
- Removed the local image decoding and saving steps from the image generation example.
- Introduced a new example for video generation requests.
pyproject.toml
- Added pytest>=9.0.2 to the project dependencies.
- Added pre-commit>=4.5.1 to the project dependencies.
sglang
- Updated the sglang subproject commit hash from 45095bac70ef1382425cb86f4b7af66dc6e7641c to 2274bfebb194f83ac1f2b69e100a27c4a27d94d2.
src/sglang_diffusion_routing/router/diffusion_router.py
- Modified the generate method to filter workers, routing image generation requests only to those without video support.
- Added a check in generate to return a 400 error if no image-capable workers are available.
- Included the video_support status in the response of the health_workers endpoint.
tests/unit/test_router_endpoints.py
- Added test_generate_routes_only_to_image_workers to verify correct routing to image-only workers.
- Added test_generate_includes_unknown_workers to ensure workers with unknown video support are considered image-capable.
- Added test_generate_returns_400_when_only_video_workers to test error handling when no image workers are present.

Activity

The author has formatted the code with pre-commit run --all-files.
The author has added or updated unit tests.
The author has updated documentation.
The author has provided expected outputs in the readme for benchmarking.
The pull request is awaiting approvals from maintainers and other reviewers.
All CI checks need to pass before merging.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces significant improvements to the diffusion router, enabling proper routing of image generation requests to image-only workers and adding support for video generation. The changes include updating the README.md with new examples for video generation and clarifying image generation behavior, updating dependencies in pyproject.toml, and bumping the sglang submodule. Crucially, the diffusion_router.py now intelligently filters workers based on their video support for image generation tasks, and new unit tests ensure the correctness of this routing logic. The addition of video_support to worker health information also enhances observability.

…port

dreamyang-liu · 2026-02-25T04:54:57Z

+        candidate_workers = [
+            worker_url
+            for worker_url, support in self.worker_video_support.items()
+            if support is not None and not support


This is not necessary, if support is enough

because support cannot be None? what's the logic here?

My bad, was looking at another function. The code here should work.

But I do feel like the logic here seems confusing. If the intent is to find image-capable workers, using worker_video_support and checking not support is an implicit approach.

If later we have some othe task type supported, it would cause issue.

If you look at how video support is determined, it is determined by checking that the task type is not in _IMAGE_TASK_TYPES. See method _probe_worker_video_support(). So it is unclear to me what is the plan down the road, and how we should differentiate between video and image diffusion models.

Another thing is that sglang exposes /v1/image/generation and /v1/video/generate regardless of the model type, and in the particular case of WAN1.3B, I believe it can generate images by setting the num of frames to 1. Is this something we want to support?

Another thing is that sglang exposes /v1/image/generation and /v1/video/generate regardless of the model type, and in the particular case of WAN1.3B, I believe it can generate images by setting the num of frames to 1. Is this something we want to support?

I don't think that's something we want to support, we should differentiate this by the API supported by model. if the model doesn't accept the image generation payload, then we shouldn't treat it as a image generation capaable model.

If you look at how video support is determined, it is determined by checking that the task type is not in _IMAGE_TASK_TYPES. See method _probe_worker_video_support(). So it is unclear to me what is the plan down the road, and how we should differentiate between video and image diffusion models.

I think we want to refactor this, this indirect check may it hard to understand the code.

I have refactored to use task_types instead of video_support

…uter and tests

dreamyang-liu · 2026-03-09T03:36:04Z

+    "model": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
+    "prompt": "a flowing river",
+})
+# {'id': '8286716d-7ef9-43ce-a3af-ce443543d221', 'object': 'video', 'model': 'Wan-AI/Wan2.1-T2V-1.3B-Diffusers', 'status': 'queued', 'progress': 0, 'created_at': 1771877888, 'size': '832x480', 'seconds': '6', 'quality': 'standard', 'url': None, 'remixed_from_video_id': None, 'completed_at': None, 'expires_at': None, 'error': None, 'file_path': './sglang-diffusion-routing/outputs/8286716d-7ef9-43ce-a3af-ce443543d221.mp4', 'peak_memory_mb': None, 'inference_time_s': None}


Can we clean up those commented test payload if they are not necessary.

dreamyang-liu · 2026-03-09T03:37:35Z

-        # True: supports, False: image-only, None: unknown/unprobed
-        self.worker_video_support: dict[str, bool | None] = {}
+        # URL -> task_type string (e.g. "T2I", "T2V"), or None if unprobed
+        self.worker_task_type: dict[str, str | None] = {}


Is it possible a single worker support multiple task type ?

dreamyang-liu · 2026-03-09T03:41:29Z

+            worker_url
+            for worker_url, task_type in self.worker_task_type.items()
+            if task_type is not None
+            and task_type.upper() in _VIDEO_TASK_TYPES
+            and worker_url in self.worker_request_counts
+            and worker_url not in self.dead_workers
+            and worker_url not in self.sleeping_workers


Can we simplify this logic, is it a bit over-defensing ?

dreamyang-liu · 2026-03-09T03:43:54Z

 _IMAGE_TASK_TYPES = {"T2I", "I2I", "TI2I"}
+_VIDEO_TASK_TYPES = {"T2V"}


There' re different task type here, can we clarify what's the mapping relation here ?

Is it 1 model -> 1 task type or 1 mode -> multi task type ?

Should we unify this by task type rather than image / video ?

zhaochenyang20 · 2026-03-15T17:23:46Z

Hey @dreamyang-liu @alphabetc1 ,

Shirui and I discussed the routing design and we think the router should not be responsible for distinguishing between different workload types (e.g., text-to-image vs. image-to-image). A better design is:

Each router group should only serve one type of model. Different model types (e.g., a diffusion model for rollout, a VLM for reward) should each have their own independent router. This way, work type becomes a router-level attribute, rather than something the router needs to figure out internally.

Specifically:

If someone tries to add a worker of a mismatched type to a router, it should raise an error directly.
Scheduling across different model types should be handled by deploying multiple routers.

This keeps the router logic much simpler and aligns well with SGLang's existing DP/DPA scheduling approach.

Let me know if you have any questions!

dreamyang-liu · 2026-03-16T01:47:05Z

Hey @dreamyang-liu @alphabetc1 ,

Shirui and I discussed the routing design and we think the router should not be responsible for distinguishing between different workload types (e.g., text-to-image vs. image-to-image). A better design is:

Each router group should only serve one type of model. Different model types (e.g., a diffusion model for rollout, a VLM for reward) should each have their own independent router. This way, work type becomes a router-level attribute, rather than something the router needs to figure out internally.

Specifically:
* If someone tries to add a worker of a mismatched type to a router, it should raise an error directly.

* Scheduling across different model types should be handled by deploying multiple routers.
This keeps the router logic much simpler and aligns well with SGLang's existing DP/DPA scheduling approach.

Let me know if you have any questions!

@zhaochenyang20
In this case, I feel like it's more like a load balancer, if we think if this is the path we want, we need to clearly document it in the README and we need to have strict enforcement when launching the router. Also, I'm curious when multiple worker with different model (same workload type), what would be the expected behavior or we want to ban this scenario. I think there's still some ambiguity require clarification.

alphabetc1 · 2026-03-16T07:52:36Z

Hey @dreamyang-liu @alphabetc1 ,

Shirui and I discussed the routing design and we think the router should not be responsible for distinguishing between different workload types (e.g., text-to-image vs. image-to-image). A better design is:

Each router group should only serve one type of model. Different model types (e.g., a diffusion model for rollout, a VLM for reward) should each have their own independent router. This way, work type becomes a router-level attribute, rather than something the router needs to figure out internally.

Specifically:

If someone tries to add a worker of a mismatched type to a router, it should raise an error directly.

Scheduling across different model types should be handled by deploying multiple routers.

This keeps the router logic much simpler and aligns well with SGLang's existing DP/DPA scheduling approach.

Let me know if you have any questions!

The single-type-per-router approach is cleaner internally but shifts complexity to users — they'd need to manage multiple router addresses and figure out which one to hit. For just image + video, the current unified router design is pragmatic and works.
For now, I'd recommend keeping the current design and hardening the probe logic (retries, manual type override as fallback) rather than a full restructure.

If we ever need to split later, we can introduce a thin gateway in front of the single-type routers to keep a single-endpoint UX.

chinsengi · 2026-03-18T18:45:31Z

@zhaochenyang20 @dreamyang-liu @alphabetc1 So what is the final verdict here?

chinsengi added 4 commits February 23, 2026 23:13

- Exclude video workers (task_type T2V) from /generate candidates

95c1253

- Bump sglang version to enable update_weights_from_disk API

update expected output in readme

ccf7f6f

Merge branch 'main' of https://github.com/chinsengi/sglang-diffusion-…

3939e4a

…routing

add pre-commit dependency

d05ac77

gemini-code-assist bot reviewed Feb 24, 2026

View reviewed changes

Comment thread README.md

Comment thread README.md

Comment thread pyproject.toml Outdated

prevent sending image generation job to server with unknown video sup…

6738973

…port

dreamyang-liu reviewed Feb 25, 2026

View reviewed changes

alphabetc1 self-assigned this Feb 25, 2026

alphabetc1 reviewed Feb 28, 2026

View reviewed changes

Comment thread src/sglang_diffusion_routing/router/diffusion_router.py Outdated

alphabetc1 reviewed Feb 28, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

alphabetc1 reviewed Feb 28, 2026

View reviewed changes

Comment thread sglang

chinsengi and others added 5 commits March 2, 2026 11:41

Merge branch 'main' into fix/t2v-routing-video-support

f40961b

use generic task type field

969c9c5

Merge branch 'main' into fix/t2v-routing-video-support

4af2517

refactor: update video support references to task type in DiffusionRo…

ccca588

…uter and tests

fix test

7fb9071

chinsengi requested review from alphabetc1 and dreamyang-liu March 5, 2026 07:27

update error code when no worker available

99529d5

dreamyang-liu reviewed Mar 9, 2026

View reviewed changes

		_IMAGE_TASK_TYPES = {"T2I", "I2I", "TI2I"}
		_VIDEO_TASK_TYPES = {"T2V"}

Conversation

chinsengi commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Feb 24, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 commented Mar 15, 2026

Uh oh!

dreamyang-liu commented Mar 16, 2026

Uh oh!

alphabetc1 commented Mar 16, 2026

Uh oh!

chinsengi commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chinsengi commented Feb 24, 2026 •

edited

Loading