Skip to content

fix: prevent /generate 502 caused by event loop mismatch + add e2e tests#33

Merged
zhaochenyang20 merged 6 commits intozhaochenyang20:mainfrom
alphabetc1:ci/add_more_testcase
Mar 4, 2026
Merged

fix: prevent /generate 502 caused by event loop mismatch + add e2e tests#33
zhaochenyang20 merged 6 commits intozhaochenyang20:mainfrom
alphabetc1:ci/add_more_testcase

Conversation

@alphabetc1
Copy link
Copy Markdown
Collaborator

@alphabetc1 alphabetc1 commented Feb 24, 2026

Motivation

1) Fix /generate returning 502 (introduced by #14)

There’s a bug caused by #14 where /generate may return 502.

Root cause (in plain English):

  • asyncio.run() spins up a temporary event loop, and closes it once the coroutine finishes.
  • httpx.AsyncClient was created/used inside that temporary loop.
  • Later when Uvicorn starts serving requests, it runs on a different event loop.
  • So when the router forwards requests, it ends up using an async client tied to the old (already closed) loopEvent loop is closed.
  • The router catches that exception and returns 502.

2) Add E2E tests

Added:

  • test_e2e_sglang.py: real E2E, but won’t be wired into CI for now (it’s heavier / depends on external runtime).
  • test_fake_e2e.py: lightweight fake E2E, and will run in CI to cover the basic end-to-end flow.

See development.md for usage / how to run them locally.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

  • Format your code with pre-commit run --all-files.
  • Add or update unit tests if applicable.
  • Update documentation if applicable.
  • Provide accuracy and performance benchmark results if applicable.
  • Ensure all CI checks pass.

Review Process

  1. Get approvals from maintainers and other reviewers.
  2. Ensure all CI tests pass.
  3. After green CI and required approvals, ask maintainers to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @alphabetc1, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug in the video support refresh mechanism by moving it to an asynchronous background task, preventing router startup delays. It significantly enhances testing capabilities by introducing a new end-to-end test suite for real SGLang workers and a fake worker for CPU-only integration tests, ensuring the router's stability and correct behavior across various scenarios.

Highlights

  • Video Support Refresh Bugfix: The video support refresh logic, previously blocking router startup in main.py, has been refactored. It is now asynchronously integrated into the router's background health check loop in diffusion_router.py, ensuring non-blocking and continuous probing of worker video capabilities.
  • New End-to-End (E2E) Test Suite: A comprehensive E2E test suite (tests/e2e/test_e2e_sglang.py) has been introduced to validate the router's functionality with real SGLang diffusion workers. These tests cover image and video generation, load balancing, and proxying, with automatic skipping if SGLang or GPU resources are not available.
  • Fake Worker for Unit E2E Testing: A fake_worker.py module was added to simulate SGLang diffusion workers. This enables robust, CPU-only end-to-end tests (tests/unit/test_fake_e2e.py) for the router without requiring actual GPU hardware or SGLang installations, improving test coverage and development efficiency.
  • Router Robustness for Video Generation: The generate_video endpoint in the router now explicitly checks if any workers are registered. If no workers are available, it returns a 503 Service Unavailable error, preventing potential issues and providing clearer feedback.
  • Updated Test Infrastructure and Documentation: The development.md file has been updated with instructions for running the new E2E tests, and pyproject.toml has been configured to include the new E2E test paths for pytest discovery.
Changelog
  • development.md
    • Updated documentation to include instructions for running real E2E tests, specifying GPU requirements and explicit command.
  • pyproject.toml
    • Configured pytest to discover and run tests from the new tests/e2e directory in addition to tests/unit.
  • src/sglang_diffusion_routing/cli/main.py
    • Removed the synchronous refresh_worker_video_support calls during router initialization to prevent blocking startup.
  • src/sglang_diffusion_routing/router/diffusion_router.py
    • Refactored the video support capability probing for pre-registered workers into the asynchronous _start_background_health_check loop.
    • Added a check in generate_video to return a 503 status code if no workers are registered in the pool.
  • tests/conftest.py
    • Added a pytest configuration file to ensure the local src directory is prioritized in sys.path for test imports.
  • tests/e2e/test_e2e_sglang.py
    • Added a new end-to-end test suite for the SGLang diffusion router, covering image generation, video generation, load balancing, and worker health checks.
    • Implemented helper functions for managing SGLang worker processes, checking health, and handling environment variables.
    • Included skip conditions for tests if SGLang is not installed or no GPU is available.
  • tests/unit/fake_worker.py
    • Added a FastAPI application that simulates an SGLang diffusion worker, providing mock responses for image and video generation, health checks, and weight updates.
    • Included configurable parameters for simulated failure rates and latencies to test router robustness.
  • tests/unit/test_fake_e2e.py
    • Implemented CPU-only end-to-end tests using the fake worker, validating router functionality such as health, worker registration, image/video generation, weight updates, load balancing (round-robin and least-request), proxying, and worker failure scenarios.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a bugfix for refreshing video support in workers and adds an extensive suite of end-to-end tests. While the changes improve the robustness of the worker probing logic, a Server-Side Request Forgery (SSRF) vulnerability was identified in the new background health check due to insufficient validation of worker URLs. Additionally, there is a minor suggestion regarding import style in the new test files.

Comment on lines +77 to +80
await asyncio.gather(
*(self.refresh_worker_video_support(url) for url in unknown_workers),
return_exceptions=True,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The _start_background_health_check function initiates network requests to worker URLs that are not sufficiently validated. The validation logic in normalize_worker_url (used when workers are registered) does not block private IP addresses (RFC 1918) and can be bypassed using a trailing dot in the hostname (e.g., 169.254.169.254.). This allows an attacker to probe internal network services or cloud metadata endpoints by registering malicious worker URLs via the /add_worker endpoint.

To remediate this, ensure that normalize_worker_url strictly validates that the hostname is not a private or loopback IP address and correctly handles trailing dots in hostnames.

Comment thread tests/unit/test_fake_e2e.py Outdated
Comment on lines +17 to +24
import base64
import os
import signal
import socket
import subprocess
import sys
import time
from pathlib import Path
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Per PEP 8, imports should be at the top of the file. concurrent.futures is currently imported inside test methods (lines 544, 664, 686). It should be imported only once at the top level with other standard library modules. Please also remove the local imports from the test methods.

Suggested change
import base64
import os
import signal
import socket
import subprocess
import sys
import time
from pathlib import Path
import base64
import concurrent.futures
import os
import signal
import socket
import subprocess
import sys
import time
from pathlib import Path
References
  1. Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants. (link)

@alphabetc1 alphabetc1 changed the title fix: refresh video support bugfix & add e2e test fix: prevent /generate 502 caused by event loop mismatch + add e2e tests Feb 24, 2026
await asyncio.gather(
*(self.refresh_worker_video_support(url) for url in unknown_workers),
return_exceptions=True,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still feel like it shouldn't be an one-off check, we might want to have one specific loop task to check it periodically. Also putting this refresh here changes the semnaitc of _start_background_health_check function.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. Maybe we can open a separate PR to implement this, and keep this PR focused on decoupling the video support logic from cli/main.py.

@@ -0,0 +1,489 @@
"""
End-to-end tests with real sglang diffusion workers.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is not necessary, instead can we describe the test logistic here ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

_WEBP_RIFF_SIGNATURE = b"RIFF"
_WEBP_FORMAT_MARKER = b"WEBP"


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like a lot function and class can reuse the one in https://github.com/zhaochenyang20/sglang-diffusion-routing/blob/main/src/sglang_diffusion_routing/launcher/utils.py

Can we try to reduce the duplicate code ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread tests/unit/fake_worker.py Outdated
@@ -0,0 +1,167 @@
#!/usr/bin/env python3
"""
Fake sglang diffusion worker for e2e testing.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is a bit confusing. This file is under unit, probably e2e cpu test and move it to e2e.

Also might worth discussing is it really necessary to have this fake worker. Given self.client in DiffusionRouter is the only interface between router and worker, can we hijack the client to mock different behaviors from worker rather than implement this worker ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split the tests into tests/unit, tests/integration , and tests/e2e

Suite Directory Scope Hardware Runs in CI
Unit tests/unit Function-level tests CPU Yes
Integration tests/integration End-to-end flow/link tests CPU Yes
E2E tests/e2e Real end-to-end tests CPU + GPU No

@alphabetc1 alphabetc1 requested a review from dreamyang-liu March 3, 2026 06:40
@zhaochenyang20 zhaochenyang20 merged commit 634aba8 into zhaochenyang20:main Mar 4, 2026
2 checks passed
@alphabetc1 alphabetc1 deleted the ci/add_more_testcase branch March 4, 2026 05:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants