Skip to content

[router] Validate video generation before slection; Support selctive worker choice#14

Merged
zhaochenyang20 merged 10 commits intomainfrom
validate_video_generation
Feb 23, 2026
Merged

[router] Validate video generation before slection; Support selctive worker choice#14
zhaochenyang20 merged 10 commits intomainfrom
validate_video_generation

Conversation

@zhaochenyang20
Copy link
Copy Markdown
Owner

This PR fixes a routing mismatch where /generate_video could be sent to image-only workers (e.g., Qwen/Qwen-Image), causing confusing “video request but image output” behavior.

  1. Added per-worker capability cache: worker_video_support (True/False/None): On worker registration (add_worker and initial CLI --worker-urls), router probes /v1/models once and stores whether the worker is video-capable.
  2. Updated /generate_video to route only to workers with support is True; if none exist, return 400 with a clear error.
  3. Generalized routing path: Renamed _use_url to _select_worker_by_routing. Extended _forward_to_worker(..., worker_urls=None) so routing can be restricted to a worker subset.
  4. Updated docs: Removed misleading generate_video example using Qwen/Qwen-Image. Clarified /generate_video behavior for image-only workers.

Keep a clean extension point for future scenarios where one router manages heterogeneous models and requests can target selected worker groups.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @zhaochenyang20, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the router's intelligence by preventing video generation requests from being misdirected to image-only workers, thereby improving user experience and system reliability. It introduces a mechanism to identify and cache worker capabilities, enabling more precise routing decisions. Additionally, the changes generalize the routing infrastructure to support more flexible worker selection based on specific request requirements.

Highlights

  • Worker Capability Caching: Implemented a per-worker capability cache (worker_video_support) to track if a worker supports video generation, populated during registration and initial CLI setup.
  • Video Request Routing Enforcement: Ensured that /generate_video requests are routed exclusively to workers marked as video-capable, returning a 400 error if no such workers are available.
  • Generalized Routing Path: Refactored the worker selection logic by renaming _use_url to _select_worker_by_routing and extending _forward_to_worker to allow routing to a specified subset of workers.
  • Documentation Updates: Updated documentation by removing a misleading generate_video example that used an image-only model and clarified the behavior of /generate_video for image-only workers.
Changelog
  • README.md
    • Removed an outdated curl example for video generation that incorrectly used an image-only model.
    • Updated the description for /generate_video endpoint to mention rejection of image-only workers.
    • Removed SGLANG_USE_MODELSCOPE=TRUE environment variable from benchmark commands.
  • sglang
    • Added sglang as a Git subproject.
  • src/sglang_diffusion_routing/cli/main.py
    • Imported the asyncio module.
    • Added a call to router._refresh_worker_video_support when workers are initially registered via CLI arguments.
  • src/sglang_diffusion_routing/router/diffusion_router.py
    • Defined _IMAGE_TASK_TYPES constant for identifying image-only task types.
    • Introduced worker_video_support dictionary to store video capability for each worker.
    • Renamed _use_url to _select_worker_by_routing and added an optional worker_urls parameter to filter candidate workers.
    • Modified _forward_to_worker to accept an optional worker_urls parameter for targeted routing.
    • Implemented _probe_worker_video_support to asynchronously query worker capabilities via /v1/models.
    • Implemented _refresh_worker_video_support to update the cached video capability for a worker.
    • Updated the generate_video endpoint to filter for video-capable workers and return a 400 error if none are found.
    • Initialized worker_video_support to None when a new worker is registered in register_worker.
    • Triggered _refresh_worker_video_support after a worker is added via the add_worker API endpoint.
Activity
  • No specific activity (comments, reviews, progress updates) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces a valuable feature for routing video generation requests to capable workers, which is a great improvement. However, a critical security audit identified a significant Server-Side Request Forgery (SSRF) vulnerability in the worker registration process. The validation logic in _normalize_worker_url is incomplete, allowing registration of internal network addresses. This, combined with the lack of authentication on management endpoints like /add_worker, could allow an attacker to use the router as a proxy to attack internal services. Additionally, there is one critical issue and a couple of suggestions for improvement regarding performance and robustness, along with minor typos in the pull request title ('slection' -> 'selection', 'selctive' -> 'selective').

Comment thread src/sglang_diffusion_routing/router/diffusion_router.py Outdated
Comment thread src/sglang_diffusion_routing/router/diffusion_router.py Outdated
Comment thread src/sglang_diffusion_routing/router/diffusion_router.py
Comment thread src/sglang_diffusion_routing/cli/main.py Outdated
Comment thread src/sglang_diffusion_routing/router/diffusion_router.py Outdated
Comment thread src/sglang_diffusion_routing/router/diffusion_router.py Outdated
zhaochenyang20 and others added 4 commits February 22, 2026 10:53
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Comment on lines +34 to +38
if refresh_tasks:

async def _refresh_all_worker_video_support() -> None:
await asyncio.gather(*refresh_tasks)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this refresh task something like health check (periodical check)? One scenario I'm thinking is someone shutdown the worker and launch a new worker with same worker url, then the data from router would be stale.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason to support updating type via /health: during the startup phase, _probe_worker_video_support may fail, which would prevent the worker from receiving any subsequent requests. However, if we continuously probe via /health, the worker can automatically rejoin the routing pool and handle requests once it recovers.

@alphabetc1
Copy link
Copy Markdown
Collaborator

alphabetc1 commented Feb 22, 2026

Noticed the sglang submodule update. Should we also add git submodule update --init --recursive to the README.md Installation section?

Comment thread src/sglang_diffusion_routing/router/diffusion_router.py
Comment thread src/sglang_diffusion_routing/cli/main.py Outdated
Comment thread src/sglang_diffusion_routing/cli/main.py Outdated
@alphabetc1
Copy link
Copy Markdown
Collaborator

alphabetc1 commented Feb 22, 2026

Maybe we should move the worker type probing logic into register_worker, so we don't need to probe separately for both initialization and add_worker scenarios. Also, cli/main.py is primarily for CLI handling, so placing router-internal logic here feels a bit odd.
Additionally, we could make type a basic worker attribute and return it in /list_workers @zhaochenyang20

Comment thread src/sglang_diffusion_routing/router/diffusion_router.py Outdated
@zhaochenyang20 zhaochenyang20 merged commit 9b46df9 into main Feb 23, 2026
1 check passed
@zhaochenyang20 zhaochenyang20 deleted the validate_video_generation branch February 23, 2026 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants