Conversation
- Bump sglang version to enable update_weights_from_disk API
Summary of ChangesHello @chinsengi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refines the routing mechanism for diffusion models by intelligently directing image generation requests only to workers capable of handling them, thereby avoiding errors with video-specific workers. It also updates documentation and dependencies to support these changes and improve development practices. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant improvements to the diffusion router, enabling proper routing of image generation requests to image-only workers and adding support for video generation. The changes include updating the README.md with new examples for video generation and clarifying image generation behavior, updating dependencies in pyproject.toml, and bumping the sglang submodule. Crucially, the diffusion_router.py now intelligently filters workers based on their video support for image generation tasks, and new unit tests ensure the correctness of this routing logic. The addition of video_support to worker health information also enhances observability.
| candidate_workers = [ | ||
| worker_url | ||
| for worker_url, support in self.worker_video_support.items() | ||
| if support is not None and not support |
There was a problem hiding this comment.
This is not necessary, if support is enough
There was a problem hiding this comment.
because support cannot be None? what's the logic here?
There was a problem hiding this comment.
My bad, was looking at another function. The code here should work.
But I do feel like the logic here seems confusing. If the intent is to find image-capable workers, using worker_video_support and checking not support is an implicit approach.
If later we have some othe task type supported, it would cause issue.
There was a problem hiding this comment.
If you look at how video support is determined, it is determined by checking that the task type is not in _IMAGE_TASK_TYPES. See method _probe_worker_video_support(). So it is unclear to me what is the plan down the road, and how we should differentiate between video and image diffusion models.
Another thing is that sglang exposes /v1/image/generation and /v1/video/generate regardless of the model type, and in the particular case of WAN1.3B, I believe it can generate images by setting the num of frames to 1. Is this something we want to support?
There was a problem hiding this comment.
Another thing is that sglang exposes /v1/image/generation and /v1/video/generate regardless of the model type, and in the particular case of WAN1.3B, I believe it can generate images by setting the num of frames to 1. Is this something we want to support?
I don't think that's something we want to support, we should differentiate this by the API supported by model. if the model doesn't accept the image generation payload, then we shouldn't treat it as a image generation capaable model.
There was a problem hiding this comment.
If you look at how video support is determined, it is determined by checking that the task type is not in _IMAGE_TASK_TYPES. See method _probe_worker_video_support(). So it is unclear to me what is the plan down the road, and how we should differentiate between video and image diffusion models.
I think we want to refactor this, this indirect check may it hard to understand the code.
There was a problem hiding this comment.
I have refactored to use task_types instead of video_support
| "model": "Wan-AI/Wan2.1-T2V-1.3B-Diffusers", | ||
| "prompt": "a flowing river", | ||
| }) | ||
| # {'id': '8286716d-7ef9-43ce-a3af-ce443543d221', 'object': 'video', 'model': 'Wan-AI/Wan2.1-T2V-1.3B-Diffusers', 'status': 'queued', 'progress': 0, 'created_at': 1771877888, 'size': '832x480', 'seconds': '6', 'quality': 'standard', 'url': None, 'remixed_from_video_id': None, 'completed_at': None, 'expires_at': None, 'error': None, 'file_path': './sglang-diffusion-routing/outputs/8286716d-7ef9-43ce-a3af-ce443543d221.mp4', 'peak_memory_mb': None, 'inference_time_s': None} |
There was a problem hiding this comment.
Can we clean up those commented test payload if they are not necessary.
| # True: supports, False: image-only, None: unknown/unprobed | ||
| self.worker_video_support: dict[str, bool | None] = {} | ||
| # URL -> task_type string (e.g. "T2I", "T2V"), or None if unprobed | ||
| self.worker_task_type: dict[str, str | None] = {} |
There was a problem hiding this comment.
Is it possible a single worker support multiple task type ?
| worker_url | ||
| for worker_url, task_type in self.worker_task_type.items() | ||
| if task_type is not None | ||
| and task_type.upper() in _VIDEO_TASK_TYPES | ||
| and worker_url in self.worker_request_counts | ||
| and worker_url not in self.dead_workers | ||
| and worker_url not in self.sleeping_workers |
There was a problem hiding this comment.
Can we simplify this logic, is it a bit over-defensing ?
| _IMAGE_TASK_TYPES = {"T2I", "I2I", "TI2I"} | ||
| _VIDEO_TASK_TYPES = {"T2V"} |
There was a problem hiding this comment.
There' re different task type here, can we clarify what's the mapping relation here ?
Is it 1 model -> 1 task type or 1 mode -> multi task type ?
Should we unify this by task type rather than image / video ?
|
Hey @dreamyang-liu @alphabetc1 , Shirui and I discussed the routing design and we think the router should not be responsible for distinguishing between different workload types (e.g., text-to-image vs. image-to-image). A better design is: Each router group should only serve one type of model. Different model types (e.g., a diffusion model for rollout, a VLM for reward) should each have their own independent router. This way, work type becomes a router-level attribute, rather than something the router needs to figure out internally. Specifically:
This keeps the router logic much simpler and aligns well with SGLang's existing DP/DPA scheduling approach. Let me know if you have any questions! |
@zhaochenyang20 |
The single-type-per-router approach is cleaner internally but shifts complexity to users — they'd need to manage multiple router addresses and figure out which one to hit. For just image + video, the current unified router design is pragmatic and works. If we ever need to split later, we can introduce a thin gateway in front of the single-type routers to keep a single-endpoint UX. |
|
@zhaochenyang20 @dreamyang-liu @alphabetc1 So what is the final verdict here? |
Motivation
When routing image generation to T2V worker with
Wan-AI/Wan2.1-T2V-1.3B-Diffusers, the worker will go through video diffusion process and eventually error.Modifications
Benchmarking and Profiling
Expected outputs are updated in readme
Checklist
pre-commit run --all-files.Review Process