Add launcher to allow launching worker from router by dreamyang-liu · Pull Request #18 · zhaochenyang20/sglang-diffusion-routing

dreamyang-liu · 2026-02-22T21:59:03Z

This PR adds a launcher module that spawns and manages sglang worker processes directly from the router CLI via --launcher-config . To close #9

We introduce a LauncherBackend abstraction to support different deployment strategies. Currently only the local backend is implemented (workers as local subprocesses), but the interface is designed to extend to multi-node or Kubernetes clusters.

All backends follow the same three-phase lifecycle:

launch — start worker processes and return their URLs
wait_ready_and_register — health-check workers concurrently; register each to the router as soon as it's ready (non-blocking)
shutdown — on router exit (Ctrl+C), send SIGINT for graceful cleanup, then SIGKILL as fallback

Example Log

 sglang-diffusion-routing git:(feat/launcher) sglang-d-router --launcher-config examples/local_launcher.yaml
[local-launcher] launching worker 0: sglang serve --model-path Qwen/Qwen-Image --num-gpus 1 --host 127.0.0.1 --port 10090 --master-port 30005 --scheduler-port 5556 --dit-cpu-offload false --text-encoder-cpu-offload false
[local-launcher] launching worker 1: sglang serve --model-path Qwen/Qwen-Image --num-gpus 1 --host 127.0.0.1 --port 10092 --master-port 31005 --scheduler-port 6555 --dit-cpu-offload false --text-encoder-cpu-offload false
[sglang-d-router] starting router on 0.0.0.0:30080
[sglang-d-router] workers: (none - add via POST /add_worker)
INFO:     Started server process [2190056]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:30080 (Press CTRL+C to quit)
[02-22 21:56:56] Disabling some offloading (except dit, text_encoder) for image generation model
[02-22 21:56:56] server_args: {"model_path": "Qwen/Qwen-Image", "backend": "auto", "attention_backend": null, "attention_backend_config": {}, "cache_dit_config": null, "nccl_port": null, "trust_remote_code": false, "revision": null, "num_gpus": 1, "tp_size": 1, "sp_degree": 1, "ulysses_degree": 1, "ring_degree": 1, "dp_size": 1, "dp_degree": 1, "enable_cfg_parallel": false, "hsdp_replicate_dim": 1, "hsdp_shard_dim": 1, "dist_timeout": 3600, "pipeline_class_name": null, "lora_path": null, "lora_nickname": "default", "lora_scale": 1.0, "vae_path": null, "lora_target_modules": null, "dit_cpu_offload": false, "dit_layerwise_offload": null, "dit_offload_prefetch_size": 0.0, "text_encoder_cpu_offload": false, "image_encoder_cpu_offload": false, "vae_cpu_offload": false, "use_fsdp_inference": false, "pin_cpu_memory": true, "comfyui_mode": false, "enable_torch_compile": false, "warmup": false, "warmup_resolutions": null, "disable_autocast": true, "master_port": 31005, "host": "127.0.0.1", "port": 10092, "webui": false, "webui_port": 12312, "scheduler_port": 6555, "output_path": "outputs/", "prompt_file_path": null, "model_paths": {}, "model_loaded": {"transformer": true, "vae": true, "video_vae": true, "audio_vae": true, "video_dit": true, "audio_dit": true, "dual_tower_bridge": true}, "boundary_ratio": null, "log_level": "info"}
[02-22 21:56:56] Starting server...
[02-22 21:56:56] Disabling some offloading (except dit, text_encoder) for image generation model
[02-22 21:56:56] server_args: {"model_path": "Qwen/Qwen-Image", "backend": "auto", "attention_backend": null, "attention_backend_config": {}, "cache_dit_config": null, "nccl_port": null, "trust_remote_code": false, "revision": null, "num_gpus": 1, "tp_size": 1, "sp_degree": 1, "ulysses_degree": 1, "ring_degree": 1, "dp_size": 1, "dp_degree": 1, "enable_cfg_parallel": false, "hsdp_replicate_dim": 1, "hsdp_shard_dim": 1, "dist_timeout": 3600, "pipeline_class_name": null, "lora_path": null, "lora_nickname": "default", "lora_scale": 1.0, "vae_path": null, "lora_target_modules": null, "dit_cpu_offload": false, "dit_layerwise_offload": null, "dit_offload_prefetch_size": 0.0, "text_encoder_cpu_offload": false, "image_encoder_cpu_offload": false, "vae_cpu_offload": false, "use_fsdp_inference": false, "pin_cpu_memory": true, "comfyui_mode": false, "enable_torch_compile": false, "warmup": false, "warmup_resolutions": null, "disable_autocast": true, "master_port": 30005, "host": "127.0.0.1", "port": 10090, "webui": false, "webui_port": 12312, "scheduler_port": 5556, "output_path": "outputs/", "prompt_file_path": null, "model_paths": {}, "model_loaded": {"transformer": true, "vae": true, "video_vae": true, "audio_vae": true, "video_dit": true, "audio_dit": true, "dual_tower_bridge": true}, "boundary_ratio": null, "log_level": "info"}
[02-22 21:56:56] Starting server...
[02-22 21:57:02] Scheduler bind at endpoint: tcp://127.0.0.1:6555
[02-22 21:57:02] Scheduler bind at endpoint: tcp://127.0.0.1:5556
[02-22 21:57:02] Initializing distributed environment with world_size=1, device=cuda:0, timeout=3600
[02-22 21:57:02] Setting distributed timeout to 3600 seconds
[02-22 21:57:02] Initializing distributed environment with world_size=1, device=cuda:0, timeout=3600
[02-22 21:57:02] Setting distributed timeout to 3600 seconds
[02-22 21:57:02] No pipeline_class_name specified, using model_index.json
[02-22 21:57:02] No pipeline_class_name specified, using model_index.json
[02-22 21:57:02] Downloaded model_index.json for Qwen/Qwen-Image, pipeline: QwenImagePipeline
[02-22 21:57:02] Using native sglang backend for model 'Qwen/Qwen-Image'
[02-22 21:57:02] Found model info: ModelInfo(pipeline_cls=<class 'sglang.multimodal_gen.runtime.pipelines.qwen_image.QwenImagePipeline'>, sampling_param_cls=<class 'sglang.multimodal_gen.configs.sample.qwenimage.QwenImageSamplingParams'>, pipeline_config_cls=<class 'sglang.multimodal_gen.configs.pipeline_configs.qwen_image.QwenImagePipelineConfig'>)
[02-22 21:57:02] Using pipeline from model_index.json: QwenImagePipeline
[02-22 21:57:02] Loading pipeline modules...
[02-22 21:57:02] Checking for cached model in HF Hub cache for Qwen/Qwen-Image...
[02-22 21:57:02] Found complete model in cache at /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6
[02-22 21:57:02] Model path: /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6
[02-22 21:57:02] Diffusers version: 0.34.0.dev0
[02-22 21:57:02] Loading pipeline modules from config: {'_class_name': 'QwenImagePipeline', '_diffusers_version': '0.34.0.dev0', 'scheduler': ['diffusers', 'FlowMatchEulerDiscreteScheduler'], 'text_encoder': ['transformers', 'Qwen2_5_VLForConditionalGeneration'], 'tokenizer': ['transformers', 'Qwen2Tokenizer'], 'transformer': ['diffusers', 'QwenImageTransformer2DModel'], 'vae': ['diffusers', 'AutoencoderKLQwenImage']}
[02-22 21:57:02] Loading required components: ['text_encoder', 'tokenizer', 'vae', 'transformer', 'scheduler']
Loading required modules:   0%|                                                                                                                                        | 0/5 [00:00<?, ?it/s][02-22 21:57:02] Loading text_encoder from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/text_encoder. avail mem: 89.63 GB
[02-22 21:57:03] Downloaded model_index.json for Qwen/Qwen-Image, pipeline: QwenImagePipeline
[02-22 21:57:03] Using native sglang backend for model 'Qwen/Qwen-Image'
[02-22 21:57:03] Found model info: ModelInfo(pipeline_cls=<class 'sglang.multimodal_gen.runtime.pipelines.qwen_image.QwenImagePipeline'>, sampling_param_cls=<class 'sglang.multimodal_gen.configs.sample.qwenimage.QwenImageSamplingParams'>, pipeline_config_cls=<class 'sglang.multimodal_gen.configs.pipeline_configs.qwen_image.QwenImagePipelineConfig'>)
[02-22 21:57:03] Using pipeline from model_index.json: QwenImagePipeline
[02-22 21:57:03] Loading pipeline modules...
[02-22 21:57:03] Checking for cached model in HF Hub cache for Qwen/Qwen-Image...
[02-22 21:57:03] Found complete model in cache at /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6
[02-22 21:57:03] Model path: /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6
[02-22 21:57:03] Diffusers version: 0.34.0.dev0
[02-22 21:57:03] Loading pipeline modules from config: {'_class_name': 'QwenImagePipeline', '_diffusers_version': '0.34.0.dev0', 'scheduler': ['diffusers', 'FlowMatchEulerDiscreteScheduler'], 'text_encoder': ['transformers', 'Qwen2_5_VLForConditionalGeneration'], 'tokenizer': ['transformers', 'Qwen2Tokenizer'], 'transformer': ['diffusers', 'QwenImageTransformer2DModel'], 'vae': ['diffusers', 'AutoencoderKLQwenImage']}
[02-22 21:57:03] Loading required components: ['text_encoder', 'tokenizer', 'vae', 'transformer', 'scheduler']
Loading required modules:   0%|                                                                                                                                        | 0/5 [00:00<?, ?it/s][02-22 21:57:03] Loading text_encoder from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/text_encoder. avail mem: 89.63 GB
[02-22 21:57:03] Defaulting to Torch SDPA backend on SM12.x
[02-22 21:57:03] Using Torch SDPA backend
[02-22 21:57:03] Defaulting to Torch SDPA backend on SM12.x
[02-22 21:57:03] Using Torch SDPA backend
[02-22 21:57:05] [RunAI Streamer] Overall time to stream 15.4 GiB of all files to cpu: 2.25s, 6.9 GiB/s
[02-22 21:57:05] Loaded text_encoder: Qwen2_5_VLForConditionalGeneration (sgl-diffusion version). model size: 14.19 GB, avail mem: 75.28 GB
Loading required modules:  20%|█████████████████████████▌                                                                                                      | 1/5 [00:02<00:09,  2.45s/it][02-22 21:57:05] Loading tokenizer from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/tokenizer. avail mem: 75.28 GB
[02-22 21:57:05] [RunAI Streamer] Overall time to stream 15.4 GiB of all files to cpu: 2.34s, 6.6 GiB/s
[02-22 21:57:05] Loaded text_encoder: Qwen2_5_VLForConditionalGeneration (sgl-diffusion version). model size: 14.19 GB, avail mem: 75.28 GB
Loading required modules:  20%|█████████████████████████▌                                                                                                      | 1/5 [00:02<00:10,  2.54s/it][02-22 21:57:05] Loading tokenizer from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/tokenizer. avail mem: 75.28 GB
[02-22 21:57:05] Loaded tokenizer: Qwen2TokenizerFast (sgl-diffusion version). model size: 0.00 GB, avail mem: 75.28 GB
Loading required modules:  40%|███████████████████████████████████████████████████▏                                                                            | 2/5 [00:02<00:03,  1.19s/it][02-22 21:57:05] Loading vae from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/vae. avail mem: 75.28 GB
[02-22 21:57:05] Loaded tokenizer: Qwen2TokenizerFast (sgl-diffusion version). model size: 0.00 GB, avail mem: 75.28 GB
Loading required modules:  40%|███████████████████████████████████████████████████▏                                                                            | 2/5 [00:02<00:03,  1.22s/it][02-22 21:57:05] Loading vae from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/vae. avail mem: 75.28 GB
[02-22 21:57:05] Loaded vae: AutoencoderKLQwenImage (sgl-diffusion version). model size: 0.47 GB, avail mem: 74.80 GB
Loading required modules:  60%|████████████████████████████████████████████████████████████████████████████▊                                                   | 3/5 [00:02<00:01,  1.36it/s][02-22 21:57:05] Loading transformer from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/transformer. avail mem: 74.80 GB
[02-22 21:57:05] Loading QwenImageTransformer2DModel from 9 safetensors files , param_dtype: torch.bfloat16
[02-22 21:57:05] Defaulting to Torch SDPA backend on SM12.x
[02-22 21:57:05] Using Torch SDPA backend
[02-22 21:57:06] Loaded vae: AutoencoderKLQwenImage (sgl-diffusion version). model size: 0.47 GB, avail mem: 74.80 GB
Loading required modules:  60%|████████████████████████████████████████████████████████████████████████████▊                                                   | 3/5 [00:03<00:01,  1.33it/s][02-22 21:57:06] Loading transformer from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/transformer. avail mem: 74.80 GB
[02-22 21:57:06] Loading QwenImageTransformer2DModel from 9 safetensors files , param_dtype: torch.bfloat16
[02-22 21:57:06] Defaulting to Torch SDPA backend on SM12.x
[02-22 21:57:06] Using Torch SDPA backend
[02-22 21:57:10] [RunAI Streamer] Overall time to stream 38.1 GiB of all files to cpu: 4.79s, 7.9 GiB/s
[02-22 21:57:10] [RunAI Streamer] Overall time to stream 38.1 GiB of all files to cpu: 4.77s, 8.0 GiB/s
[02-22 21:57:17] Loaded model with 20.43B parameters
[02-22 21:57:17] Loaded transformer: QwenImageTransformer2DModel (sgl-diffusion version). model size: 38.05 GB, avail mem: 36.64 GB
Loading required modules:  80%|██████████████████████████████████████████████████████████████████████████████████████████████████████▍                         | 4/5 [00:14<00:04,  4.85s/it][02-22 21:57:17] Loading scheduler from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/scheduler. avail mem: 36.64 GB
[02-22 21:57:17] Loaded scheduler: FlowMatchEulerDiscreteScheduler (sgl-diffusion version). model size: 0.00 GB, avail mem: 36.64 GB
Loading required modules: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:14<00:00,  2.82s/it]
[02-22 21:57:17] Creating pipeline stages...
[02-22 21:57:17] Defaulting to Torch SDPA backend on SM12.x
[02-22 21:57:17] Using Torch SDPA backend
[02-22 21:57:17] Pipeline instantiated
[02-22 21:57:17] Worker 0: Initialized device, model, and distributed environment.
[02-22 21:57:17] Worker 0: Scheduler loop started.
[02-22 21:57:17] Starting FastAPI server.
[2026-02-22 21:57:17] INFO:     Started server process [2190153]
[2026-02-22 21:57:17] INFO:     Waiting for application startup.
[02-22 21:57:17] ZMQ Broker is listening for offline jobs on tcp://*:10093
[2026-02-22 21:57:17] INFO:     Application startup complete.
[2026-02-22 21:57:17] INFO:     Uvicorn running on http://127.0.0.1:10092 (Press CTRL+C to quit)
[02-22 21:57:17] Loaded model with 20.43B parameters
[02-22 21:57:17] Loaded transformer: QwenImageTransformer2DModel (sgl-diffusion version). model size: 38.05 GB, avail mem: 36.64 GB
Loading required modules:  80%|██████████████████████████████████████████████████████████████████████████████████████████████████████▍                         | 4/5 [00:14<00:04,  4.86s/it][02-22 21:57:17] Loading scheduler from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/scheduler. avail mem: 36.64 GB
[02-22 21:57:17] Loaded scheduler: FlowMatchEulerDiscreteScheduler (sgl-diffusion version). model size: 0.00 GB, avail mem: 36.64 GB
Loading required modules: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:14<00:00,  2.84s/it]
[02-22 21:57:17] Creating pipeline stages...
[02-22 21:57:17] Defaulting to Torch SDPA backend on SM12.x
[02-22 21:57:17] Using Torch SDPA backend
[02-22 21:57:17] Pipeline instantiated
[02-22 21:57:17] Worker 0: Initialized device, model, and distributed environment.
[02-22 21:57:17] Worker 0: Scheduler loop started.
[02-22 21:57:17] Starting FastAPI server.
[2026-02-22 21:57:17] INFO:     Started server process [2190152]
[2026-02-22 21:57:17] INFO:     Waiting for application startup.
[02-22 21:57:17] ZMQ Broker is listening for offline jobs on tcp://*:10091
[2026-02-22 21:57:17] INFO:     Application startup complete.
[2026-02-22 21:57:17] INFO:     Uvicorn running on http://127.0.0.1:10090 (Press CTRL+C to quit)
[2026-02-22 21:57:17] INFO:     127.0.0.1:52256 - "GET /health HTTP/1.1" 200 OK
  [sglang-d-router] worker http://127.0.0.1:10092 is healthy (28s)
[sglang-d-router] registered http://127.0.0.1:10092
[2026-02-22 21:57:17] INFO:     127.0.0.1:41704 - "GET /health HTTP/1.1" 200 OK
  [sglang-d-router] worker http://127.0.0.1:10090 is healthy (28s)
[sglang-d-router] registered http://127.0.0.1:10090
[2026-02-22 21:57:19] INFO:     127.0.0.1:52160 - "GET /health HTTP/1.1" 200 OK
[2026-02-22 21:57:19] INFO:     127.0.0.1:42348 - "GET /health HTTP/1.1" 200 OK

gemini-code-assist · 2026-02-22T21:59:24Z

Summary of Changes

Hello @dreamyang-liu, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the SGLang diffusion routing system by integrating a robust worker launcher. It allows the router to not only manage incoming requests but also to dynamically spawn, monitor, and gracefully shut down its associated worker processes. This new capability streamlines the deployment and scaling of SGLang workers, particularly in local development environments, while laying the groundwork for more complex multi-node or containerized deployments through its extensible backend architecture.

Highlights

Integrated Worker Launcher: A new launcher module has been added, enabling the router CLI to directly spawn and manage SGLang worker processes using a YAML configuration file.
Extensible Backend Architecture: Introduced a LauncherBackend abstraction, currently with a LocalLauncher implementation for local subprocesses, designed to be extended for multi-node or Kubernetes deployments.
Managed Worker Lifecycle: The launcher manages workers through a three-phase lifecycle: launching processes, concurrently waiting for them to become healthy and registering them with the router, and gracefully shutting them down on router exit.
Configuration Management: Utilizes omegaconf for structured configuration loading and validation of launcher settings from YAML files, enhancing flexibility and maintainability.

Changelog

examples/local_launcher.yaml
- Added a sample YAML configuration file for the local launcher, specifying model, worker counts, ports, and extra arguments.
pyproject.toml
- Added omegaconf as a new dependency for structured configuration management.
src/sglang_diffusion_routing/cli/main.py
- Modified the router's command-line interface to accept a --launcher-config argument.
- Integrated the launcher backend for worker management.
- Updated worker registration logic to use the new launcher mechanism.
src/sglang_diffusion_routing/launcher/init.py
- Created the launcher package, exposing core classes and functions for backend implementation and configuration.
src/sglang_diffusion_routing/launcher/backend.py
- Defined the LauncherBackend abstract base class, LaunchedWorker, and WorkerLaunchResult data structures, outlining the worker lifecycle.
src/sglang_diffusion_routing/launcher/config.py
- Implemented functions to load and validate launcher configurations from YAML files using omegaconf.
- Added functionality to instantiate the appropriate LauncherBackend based on configuration.
src/sglang_diffusion_routing/launcher/local.py
- Provided the concrete implementation of LocalLauncher, handling the spawning of sglang serve processes.
- Managed port allocation and GPU assignment for local workers.
- Implemented process termination logic for local workers.
src/sglang_diffusion_routing/launcher/utils.py
- Introduced utility functions for network port availability checks.
- Added GPU detection and resolution logic.
- Implemented health check polling for worker processes.
- Provided graceful and forceful process termination utilities.
tests/unit/test_cli.py
- Updated the CLI unit tests to reflect the new router initialization and worker registration flow, specifically checking the router object directly.
tests/unit/test_launcher.py
- Added comprehensive unit tests for the new launcher subsystem.
- Covered configuration loading, schema validation, backend creation, and local launcher functionality.

Activity

The pull request introduces a new feature to launch and manage SGLang worker processes directly from the router.
An example log demonstrates the router launching two workers, monitoring their health, and registering them upon readiness.
The log shows detailed output from worker startup, including model loading, device initialization, and FastAPI server activation.
Workers are health-checked and registered concurrently, with the router logging their status.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

alphabetc1 · 2026-02-23T06:19:48Z

BTW, could you provide an example in the README?

alphabetc1 · 2026-02-23T06:37:08Z

overall, LGTM

zhaochenyang20 · 2026-02-23T22:28:24Z

+    log_prefix = "[sglang-d-router]"
+    backend = None
+
+    try:
+        router = DiffusionRouter(args, verbose=args.verbose)
+
+        if args.launcher_config is not None:
+            launcher_cfg = _lcfg.load_launcher_config(args.launcher_config)
+            wait_timeout = launcher_cfg.wait_timeout
+            backend = _lcfg.create_backend(launcher_cfg)
+            backend.launch()
+            threading.Thread(
+                target=backend.wait_ready_and_register,
+                kwargs=dict(
+                    register_fn=router.register_worker,
+                    timeout=wait_timeout,
+                    log_prefix=log_prefix,
+                ),
+                daemon=True,
+            ).start()
+
+        _run_router_server(args, router=router, log_prefix=log_prefix)
+        return 0
+    finally:
+        try:
+            asyncio.run(router.client.aclose())
+        except Exception:
+            pass
+        if backend is not None:
+            print(f"{log_prefix} shutting down managed workers...", flush=True)
+            backend.shutdown()
+            print(f"{log_prefix} all managed workers terminated.", flush=True)


I left a todo here to refactor. But we can leave it right now.

zhaochenyang20 · 2026-02-23T22:37:03Z

+    master_port_base = 30005
+    scheduler_port_base = 5555
+    internal_port_stride = 1000


These parameters are concerning. Should it be set as fixed, or could it be passed in?

alphabetc1 reviewed Feb 23, 2026

View reviewed changes

Comment thread examples/local_launcher.yaml

alphabetc1 reviewed Feb 23, 2026

View reviewed changes

Comment thread src/sglang_diffusion_routing/cli/main.py Outdated

alphabetc1 reviewed Feb 23, 2026

View reviewed changes

Comment thread src/sglang_diffusion_routing/cli/main.py

alphabetc1 reviewed Feb 23, 2026

View reviewed changes

Comment thread examples/local_launcher.yaml Outdated

dreamyang-liu added 2 commits February 23, 2026 06:45

Add launcher to allow launching worker from router

f0e7748

add example to readme and remove unused cli argument

d815ee7

dreamyang-liu force-pushed the feat/launcher branch from 9fb8a56 to d815ee7 Compare February 23, 2026 06:47

dreamyang-liu added 2 commits February 23, 2026 06:49

fix readme

7023eb8

Fix worker_urls

9afe1e4

zhaochenyang20 requested changes Feb 23, 2026

View reviewed changes

Comment thread examples/local_launcher.yaml

Comment thread examples/local_launcher.yaml Outdated

zhaochenyang20 added 2 commits February 23, 2026 14:10

fix lint

f501d83

fix the conflicts between isort and black

0cf41be

zhaochenyang20 mentioned this pull request Feb 23, 2026

[Feature] Detail parameter docs for Launcher #28

Closed

2 tasks

dreamyang-liu and others added 3 commits February 23, 2026 22:44

address comments

37346a1

adds mocked port for unit tests

4cb19a7

fix conf

b7052a2

zhaochenyang20 requested changes Feb 23, 2026

View reviewed changes

zhaochenyang20 merged commit 426cd1e into zhaochenyang20:main Feb 23, 2026
1 check passed

dreamyang-liu mentioned this pull request Feb 23, 2026

Make port base configurable #31

Merged

2 tasks

BBuf mentioned this pull request Apr 20, 2026

SGLang Diffusion 外部影响力调研：kernel、feature 与平台采用情况 BBuf/how-to-optim-algorithm-in-cuda#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add launcher to allow launching worker from router#18

Add launcher to allow launching worker from router#18
zhaochenyang20 merged 9 commits intozhaochenyang20:mainfrom
dreamyang-liu:feat/launcher

dreamyang-liu commented Feb 22, 2026

Uh oh!

gemini-code-assist Bot commented Feb 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alphabetc1 commented Feb 23, 2026

Uh oh!

Uh oh!

alphabetc1 commented Feb 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhaochenyang20 Feb 23, 2026

Uh oh!

zhaochenyang20 Feb 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dreamyang-liu commented Feb 22, 2026

Uh oh!

gemini-code-assist Bot commented Feb 22, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alphabetc1 commented Feb 23, 2026

Uh oh!

Uh oh!

alphabetc1 commented Feb 23, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhaochenyang20 Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants