Add launcher to allow launching worker from router#18
Add launcher to allow launching worker from router#18zhaochenyang20 merged 9 commits intozhaochenyang20:mainfrom
Conversation
Summary of ChangesHello @dreamyang-liu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the SGLang diffusion routing system by integrating a robust worker launcher. It allows the router to not only manage incoming requests but also to dynamically spawn, monitor, and gracefully shut down its associated worker processes. This new capability streamlines the deployment and scaling of SGLang workers, particularly in local development environments, while laying the groundwork for more complex multi-node or containerized deployments through its extensible backend architecture. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
BTW, could you provide an example in the README? |
|
overall, LGTM |
9fb8a56 to
d815ee7
Compare
| log_prefix = "[sglang-d-router]" | ||
| backend = None | ||
|
|
||
| try: | ||
| router = DiffusionRouter(args, verbose=args.verbose) | ||
|
|
||
| if args.launcher_config is not None: | ||
| launcher_cfg = _lcfg.load_launcher_config(args.launcher_config) | ||
| wait_timeout = launcher_cfg.wait_timeout | ||
| backend = _lcfg.create_backend(launcher_cfg) | ||
| backend.launch() | ||
| threading.Thread( | ||
| target=backend.wait_ready_and_register, | ||
| kwargs=dict( | ||
| register_fn=router.register_worker, | ||
| timeout=wait_timeout, | ||
| log_prefix=log_prefix, | ||
| ), | ||
| daemon=True, | ||
| ).start() | ||
|
|
||
| _run_router_server(args, router=router, log_prefix=log_prefix) | ||
| return 0 | ||
| finally: | ||
| try: | ||
| asyncio.run(router.client.aclose()) | ||
| except Exception: | ||
| pass | ||
| if backend is not None: | ||
| print(f"{log_prefix} shutting down managed workers...", flush=True) | ||
| backend.shutdown() | ||
| print(f"{log_prefix} all managed workers terminated.", flush=True) |
There was a problem hiding this comment.
I left a todo here to refactor. But we can leave it right now.
| master_port_base = 30005 | ||
| scheduler_port_base = 5555 | ||
| internal_port_stride = 1000 |
There was a problem hiding this comment.
These parameters are concerning. Should it be set as fixed, or could it be passed in?
This PR adds a launcher module that spawns and manages sglang worker processes directly from the router CLI via --launcher-config . To close #9
We introduce a LauncherBackend abstraction to support different deployment strategies. Currently only the local backend is implemented (workers as local subprocesses), but the interface is designed to extend to multi-node or Kubernetes clusters.
All backends follow the same three-phase lifecycle:
Example Log