fix: prevent /generate 502 caused by event loop mismatch + add e2e tests#33
Conversation
Summary of ChangesHello @alphabetc1, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a bug in the video support refresh mechanism by moving it to an asynchronous background task, preventing router startup delays. It significantly enhances testing capabilities by introducing a new end-to-end test suite for real SGLang workers and a fake worker for CPU-only integration tests, ensuring the router's stability and correct behavior across various scenarios. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a bugfix for refreshing video support in workers and adds an extensive suite of end-to-end tests. While the changes improve the robustness of the worker probing logic, a Server-Side Request Forgery (SSRF) vulnerability was identified in the new background health check due to insufficient validation of worker URLs. Additionally, there is a minor suggestion regarding import style in the new test files.
| await asyncio.gather( | ||
| *(self.refresh_worker_video_support(url) for url in unknown_workers), | ||
| return_exceptions=True, | ||
| ) |
There was a problem hiding this comment.
The _start_background_health_check function initiates network requests to worker URLs that are not sufficiently validated. The validation logic in normalize_worker_url (used when workers are registered) does not block private IP addresses (RFC 1918) and can be bypassed using a trailing dot in the hostname (e.g., 169.254.169.254.). This allows an attacker to probe internal network services or cloud metadata endpoints by registering malicious worker URLs via the /add_worker endpoint.
To remediate this, ensure that normalize_worker_url strictly validates that the hostname is not a private or loopback IP address and correctly handles trailing dots in hostnames.
| import base64 | ||
| import os | ||
| import signal | ||
| import socket | ||
| import subprocess | ||
| import sys | ||
| import time | ||
| from pathlib import Path |
There was a problem hiding this comment.
Per PEP 8, imports should be at the top of the file. concurrent.futures is currently imported inside test methods (lines 544, 664, 686). It should be imported only once at the top level with other standard library modules. Please also remove the local imports from the test methods.
| import base64 | |
| import os | |
| import signal | |
| import socket | |
| import subprocess | |
| import sys | |
| import time | |
| from pathlib import Path | |
| import base64 | |
| import concurrent.futures | |
| import os | |
| import signal | |
| import socket | |
| import subprocess | |
| import sys | |
| import time | |
| from pathlib import Path |
References
- Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants. (link)
| await asyncio.gather( | ||
| *(self.refresh_worker_video_support(url) for url in unknown_workers), | ||
| return_exceptions=True, | ||
| ) |
There was a problem hiding this comment.
I still feel like it shouldn't be an one-off check, we might want to have one specific loop task to check it periodically. Also putting this refresh here changes the semnaitc of _start_background_health_check function.
There was a problem hiding this comment.
agree. Maybe we can open a separate PR to implement this, and keep this PR focused on decoupling the video support logic from cli/main.py.
| @@ -0,0 +1,489 @@ | |||
| """ | |||
| End-to-end tests with real sglang diffusion workers. | |||
There was a problem hiding this comment.
This line is not necessary, instead can we describe the test logistic here ?
| _WEBP_RIFF_SIGNATURE = b"RIFF" | ||
| _WEBP_FORMAT_MARKER = b"WEBP" | ||
|
|
||
|
|
There was a problem hiding this comment.
I feel like a lot function and class can reuse the one in https://github.com/zhaochenyang20/sglang-diffusion-routing/blob/main/src/sglang_diffusion_routing/launcher/utils.py
Can we try to reduce the duplicate code ?
| @@ -0,0 +1,167 @@ | |||
| #!/usr/bin/env python3 | |||
| """ | |||
| Fake sglang diffusion worker for e2e testing. | |||
There was a problem hiding this comment.
The comment is a bit confusing. This file is under unit, probably e2e cpu test and move it to e2e.
Also might worth discussing is it really necessary to have this fake worker. Given self.client in DiffusionRouter is the only interface between router and worker, can we hijack the client to mock different behaviors from worker rather than implement this worker ?
There was a problem hiding this comment.
Split the tests into tests/unit, tests/integration , and tests/e2e
| Suite | Directory | Scope | Hardware | Runs in CI |
|---|---|---|---|---|
| Unit | tests/unit | Function-level tests | CPU | Yes |
| Integration | tests/integration | End-to-end flow/link tests | CPU | Yes |
| E2E | tests/e2e | Real end-to-end tests | CPU + GPU | No |
99b0e38 to
3904852
Compare
Motivation
1) Fix
/generatereturning 502 (introduced by #14)There’s a bug caused by #14 where
/generatemay return 502.Root cause (in plain English):
asyncio.run()spins up a temporary event loop, and closes it once the coroutine finishes.httpx.AsyncClientwas created/used inside that temporary loop.Event loop is closed.2) Add E2E tests
Added:
test_e2e_sglang.py: real E2E, but won’t be wired into CI for now (it’s heavier / depends on external runtime).test_fake_e2e.py: lightweight fake E2E, and will run in CI to cover the basic end-to-end flow.See development.md for usage / how to run them locally.
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist
pre-commit run --all-files.Review Process