Skip to content

Add launcher to allow launching worker from router#18

Merged
zhaochenyang20 merged 9 commits intozhaochenyang20:mainfrom
dreamyang-liu:feat/launcher
Feb 23, 2026
Merged

Add launcher to allow launching worker from router#18
zhaochenyang20 merged 9 commits intozhaochenyang20:mainfrom
dreamyang-liu:feat/launcher

Conversation

@dreamyang-liu
Copy link
Copy Markdown
Contributor

This PR adds a launcher module that spawns and manages sglang worker processes directly from the router CLI via --launcher-config . To close #9

We introduce a LauncherBackend abstraction to support different deployment strategies. Currently only the local backend is implemented (workers as local subprocesses), but the interface is designed to extend to multi-node or Kubernetes clusters.

All backends follow the same three-phase lifecycle:

  1. launch — start worker processes and return their URLs
  2. wait_ready_and_register — health-check workers concurrently; register each to the router as soon as it's ready (non-blocking)
  3. shutdown — on router exit (Ctrl+C), send SIGINT for graceful cleanup, then SIGKILL as fallback

Example Log

 sglang-diffusion-routing git:(feat/launcher) sglang-d-router --launcher-config examples/local_launcher.yaml
[local-launcher] launching worker 0: sglang serve --model-path Qwen/Qwen-Image --num-gpus 1 --host 127.0.0.1 --port 10090 --master-port 30005 --scheduler-port 5556 --dit-cpu-offload false --text-encoder-cpu-offload false
[local-launcher] launching worker 1: sglang serve --model-path Qwen/Qwen-Image --num-gpus 1 --host 127.0.0.1 --port 10092 --master-port 31005 --scheduler-port 6555 --dit-cpu-offload false --text-encoder-cpu-offload false
[sglang-d-router] starting router on 0.0.0.0:30080
[sglang-d-router] workers: (none - add via POST /add_worker)
INFO:     Started server process [2190056]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:30080 (Press CTRL+C to quit)
[02-22 21:56:56] Disabling some offloading (except dit, text_encoder) for image generation model
[02-22 21:56:56] server_args: {"model_path": "Qwen/Qwen-Image", "backend": "auto", "attention_backend": null, "attention_backend_config": {}, "cache_dit_config": null, "nccl_port": null, "trust_remote_code": false, "revision": null, "num_gpus": 1, "tp_size": 1, "sp_degree": 1, "ulysses_degree": 1, "ring_degree": 1, "dp_size": 1, "dp_degree": 1, "enable_cfg_parallel": false, "hsdp_replicate_dim": 1, "hsdp_shard_dim": 1, "dist_timeout": 3600, "pipeline_class_name": null, "lora_path": null, "lora_nickname": "default", "lora_scale": 1.0, "vae_path": null, "lora_target_modules": null, "dit_cpu_offload": false, "dit_layerwise_offload": null, "dit_offload_prefetch_size": 0.0, "text_encoder_cpu_offload": false, "image_encoder_cpu_offload": false, "vae_cpu_offload": false, "use_fsdp_inference": false, "pin_cpu_memory": true, "comfyui_mode": false, "enable_torch_compile": false, "warmup": false, "warmup_resolutions": null, "disable_autocast": true, "master_port": 31005, "host": "127.0.0.1", "port": 10092, "webui": false, "webui_port": 12312, "scheduler_port": 6555, "output_path": "outputs/", "prompt_file_path": null, "model_paths": {}, "model_loaded": {"transformer": true, "vae": true, "video_vae": true, "audio_vae": true, "video_dit": true, "audio_dit": true, "dual_tower_bridge": true}, "boundary_ratio": null, "log_level": "info"}
[02-22 21:56:56] Starting server...
[02-22 21:56:56] Disabling some offloading (except dit, text_encoder) for image generation model
[02-22 21:56:56] server_args: {"model_path": "Qwen/Qwen-Image", "backend": "auto", "attention_backend": null, "attention_backend_config": {}, "cache_dit_config": null, "nccl_port": null, "trust_remote_code": false, "revision": null, "num_gpus": 1, "tp_size": 1, "sp_degree": 1, "ulysses_degree": 1, "ring_degree": 1, "dp_size": 1, "dp_degree": 1, "enable_cfg_parallel": false, "hsdp_replicate_dim": 1, "hsdp_shard_dim": 1, "dist_timeout": 3600, "pipeline_class_name": null, "lora_path": null, "lora_nickname": "default", "lora_scale": 1.0, "vae_path": null, "lora_target_modules": null, "dit_cpu_offload": false, "dit_layerwise_offload": null, "dit_offload_prefetch_size": 0.0, "text_encoder_cpu_offload": false, "image_encoder_cpu_offload": false, "vae_cpu_offload": false, "use_fsdp_inference": false, "pin_cpu_memory": true, "comfyui_mode": false, "enable_torch_compile": false, "warmup": false, "warmup_resolutions": null, "disable_autocast": true, "master_port": 30005, "host": "127.0.0.1", "port": 10090, "webui": false, "webui_port": 12312, "scheduler_port": 5556, "output_path": "outputs/", "prompt_file_path": null, "model_paths": {}, "model_loaded": {"transformer": true, "vae": true, "video_vae": true, "audio_vae": true, "video_dit": true, "audio_dit": true, "dual_tower_bridge": true}, "boundary_ratio": null, "log_level": "info"}
[02-22 21:56:56] Starting server...
[02-22 21:57:02] Scheduler bind at endpoint: tcp://127.0.0.1:6555
[02-22 21:57:02] Scheduler bind at endpoint: tcp://127.0.0.1:5556
[02-22 21:57:02] Initializing distributed environment with world_size=1, device=cuda:0, timeout=3600
[02-22 21:57:02] Setting distributed timeout to 3600 seconds
[02-22 21:57:02] Initializing distributed environment with world_size=1, device=cuda:0, timeout=3600
[02-22 21:57:02] Setting distributed timeout to 3600 seconds
[02-22 21:57:02] No pipeline_class_name specified, using model_index.json
[02-22 21:57:02] No pipeline_class_name specified, using model_index.json
[02-22 21:57:02] Downloaded model_index.json for Qwen/Qwen-Image, pipeline: QwenImagePipeline
[02-22 21:57:02] Using native sglang backend for model 'Qwen/Qwen-Image'
[02-22 21:57:02] Found model info: ModelInfo(pipeline_cls=<class 'sglang.multimodal_gen.runtime.pipelines.qwen_image.QwenImagePipeline'>, sampling_param_cls=<class 'sglang.multimodal_gen.configs.sample.qwenimage.QwenImageSamplingParams'>, pipeline_config_cls=<class 'sglang.multimodal_gen.configs.pipeline_configs.qwen_image.QwenImagePipelineConfig'>)
[02-22 21:57:02] Using pipeline from model_index.json: QwenImagePipeline
[02-22 21:57:02] Loading pipeline modules...
[02-22 21:57:02] Checking for cached model in HF Hub cache for Qwen/Qwen-Image...
[02-22 21:57:02] Found complete model in cache at /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6
[02-22 21:57:02] Model path: /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6
[02-22 21:57:02] Diffusers version: 0.34.0.dev0
[02-22 21:57:02] Loading pipeline modules from config: {'_class_name': 'QwenImagePipeline', '_diffusers_version': '0.34.0.dev0', 'scheduler': ['diffusers', 'FlowMatchEulerDiscreteScheduler'], 'text_encoder': ['transformers', 'Qwen2_5_VLForConditionalGeneration'], 'tokenizer': ['transformers', 'Qwen2Tokenizer'], 'transformer': ['diffusers', 'QwenImageTransformer2DModel'], 'vae': ['diffusers', 'AutoencoderKLQwenImage']}
[02-22 21:57:02] Loading required components: ['text_encoder', 'tokenizer', 'vae', 'transformer', 'scheduler']
Loading required modules:   0%|                                                                                                                                        | 0/5 [00:00<?, ?it/s][02-22 21:57:02] Loading text_encoder from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/text_encoder. avail mem: 89.63 GB
[02-22 21:57:03] Downloaded model_index.json for Qwen/Qwen-Image, pipeline: QwenImagePipeline
[02-22 21:57:03] Using native sglang backend for model 'Qwen/Qwen-Image'
[02-22 21:57:03] Found model info: ModelInfo(pipeline_cls=<class 'sglang.multimodal_gen.runtime.pipelines.qwen_image.QwenImagePipeline'>, sampling_param_cls=<class 'sglang.multimodal_gen.configs.sample.qwenimage.QwenImageSamplingParams'>, pipeline_config_cls=<class 'sglang.multimodal_gen.configs.pipeline_configs.qwen_image.QwenImagePipelineConfig'>)
[02-22 21:57:03] Using pipeline from model_index.json: QwenImagePipeline
[02-22 21:57:03] Loading pipeline modules...
[02-22 21:57:03] Checking for cached model in HF Hub cache for Qwen/Qwen-Image...
[02-22 21:57:03] Found complete model in cache at /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6
[02-22 21:57:03] Model path: /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6
[02-22 21:57:03] Diffusers version: 0.34.0.dev0
[02-22 21:57:03] Loading pipeline modules from config: {'_class_name': 'QwenImagePipeline', '_diffusers_version': '0.34.0.dev0', 'scheduler': ['diffusers', 'FlowMatchEulerDiscreteScheduler'], 'text_encoder': ['transformers', 'Qwen2_5_VLForConditionalGeneration'], 'tokenizer': ['transformers', 'Qwen2Tokenizer'], 'transformer': ['diffusers', 'QwenImageTransformer2DModel'], 'vae': ['diffusers', 'AutoencoderKLQwenImage']}
[02-22 21:57:03] Loading required components: ['text_encoder', 'tokenizer', 'vae', 'transformer', 'scheduler']
Loading required modules:   0%|                                                                                                                                        | 0/5 [00:00<?, ?it/s][02-22 21:57:03] Loading text_encoder from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/text_encoder. avail mem: 89.63 GB
[02-22 21:57:03] Defaulting to Torch SDPA backend on SM12.x
[02-22 21:57:03] Using Torch SDPA backend
[02-22 21:57:03] Defaulting to Torch SDPA backend on SM12.x
[02-22 21:57:03] Using Torch SDPA backend
[02-22 21:57:05] [RunAI Streamer] Overall time to stream 15.4 GiB of all files to cpu: 2.25s, 6.9 GiB/s
[02-22 21:57:05] Loaded text_encoder: Qwen2_5_VLForConditionalGeneration (sgl-diffusion version). model size: 14.19 GB, avail mem: 75.28 GB
Loading required modules:  20%|█████████████████████████▌                                                                                                      | 1/5 [00:02<00:09,  2.45s/it][02-22 21:57:05] Loading tokenizer from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/tokenizer. avail mem: 75.28 GB
[02-22 21:57:05] [RunAI Streamer] Overall time to stream 15.4 GiB of all files to cpu: 2.34s, 6.6 GiB/s
[02-22 21:57:05] Loaded text_encoder: Qwen2_5_VLForConditionalGeneration (sgl-diffusion version). model size: 14.19 GB, avail mem: 75.28 GB
Loading required modules:  20%|█████████████████████████▌                                                                                                      | 1/5 [00:02<00:10,  2.54s/it][02-22 21:57:05] Loading tokenizer from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/tokenizer. avail mem: 75.28 GB
[02-22 21:57:05] Loaded tokenizer: Qwen2TokenizerFast (sgl-diffusion version). model size: 0.00 GB, avail mem: 75.28 GB
Loading required modules:  40%|███████████████████████████████████████████████████▏                                                                            | 2/5 [00:02<00:03,  1.19s/it][02-22 21:57:05] Loading vae from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/vae. avail mem: 75.28 GB
[02-22 21:57:05] Loaded tokenizer: Qwen2TokenizerFast (sgl-diffusion version). model size: 0.00 GB, avail mem: 75.28 GB
Loading required modules:  40%|███████████████████████████████████████████████████▏                                                                            | 2/5 [00:02<00:03,  1.22s/it][02-22 21:57:05] Loading vae from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/vae. avail mem: 75.28 GB
[02-22 21:57:05] Loaded vae: AutoencoderKLQwenImage (sgl-diffusion version). model size: 0.47 GB, avail mem: 74.80 GB
Loading required modules:  60%|████████████████████████████████████████████████████████████████████████████▊                                                   | 3/5 [00:02<00:01,  1.36it/s][02-22 21:57:05] Loading transformer from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/transformer. avail mem: 74.80 GB
[02-22 21:57:05] Loading QwenImageTransformer2DModel from 9 safetensors files , param_dtype: torch.bfloat16
[02-22 21:57:05] Defaulting to Torch SDPA backend on SM12.x
[02-22 21:57:05] Using Torch SDPA backend
[02-22 21:57:06] Loaded vae: AutoencoderKLQwenImage (sgl-diffusion version). model size: 0.47 GB, avail mem: 74.80 GB
Loading required modules:  60%|████████████████████████████████████████████████████████████████████████████▊                                                   | 3/5 [00:03<00:01,  1.33it/s][02-22 21:57:06] Loading transformer from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/transformer. avail mem: 74.80 GB
[02-22 21:57:06] Loading QwenImageTransformer2DModel from 9 safetensors files , param_dtype: torch.bfloat16
[02-22 21:57:06] Defaulting to Torch SDPA backend on SM12.x
[02-22 21:57:06] Using Torch SDPA backend
[02-22 21:57:10] [RunAI Streamer] Overall time to stream 38.1 GiB of all files to cpu: 4.79s, 7.9 GiB/s
[02-22 21:57:10] [RunAI Streamer] Overall time to stream 38.1 GiB of all files to cpu: 4.77s, 8.0 GiB/s
[02-22 21:57:17] Loaded model with 20.43B parameters
[02-22 21:57:17] Loaded transformer: QwenImageTransformer2DModel (sgl-diffusion version). model size: 38.05 GB, avail mem: 36.64 GB
Loading required modules:  80%|██████████████████████████████████████████████████████████████████████████████████████████████████████▍                         | 4/5 [00:14<00:04,  4.85s/it][02-22 21:57:17] Loading scheduler from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/scheduler. avail mem: 36.64 GB
[02-22 21:57:17] Loaded scheduler: FlowMatchEulerDiscreteScheduler (sgl-diffusion version). model size: 0.00 GB, avail mem: 36.64 GB
Loading required modules: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:14<00:00,  2.82s/it]
[02-22 21:57:17] Creating pipeline stages...
[02-22 21:57:17] Defaulting to Torch SDPA backend on SM12.x
[02-22 21:57:17] Using Torch SDPA backend
[02-22 21:57:17] Pipeline instantiated
[02-22 21:57:17] Worker 0: Initialized device, model, and distributed environment.
[02-22 21:57:17] Worker 0: Scheduler loop started.
[02-22 21:57:17] Starting FastAPI server.
[2026-02-22 21:57:17] INFO:     Started server process [2190153]
[2026-02-22 21:57:17] INFO:     Waiting for application startup.
[02-22 21:57:17] ZMQ Broker is listening for offline jobs on tcp://*:10093
[2026-02-22 21:57:17] INFO:     Application startup complete.
[2026-02-22 21:57:17] INFO:     Uvicorn running on http://127.0.0.1:10092 (Press CTRL+C to quit)
[02-22 21:57:17] Loaded model with 20.43B parameters
[02-22 21:57:17] Loaded transformer: QwenImageTransformer2DModel (sgl-diffusion version). model size: 38.05 GB, avail mem: 36.64 GB
Loading required modules:  80%|██████████████████████████████████████████████████████████████████████████████████████████████████████▍                         | 4/5 [00:14<00:04,  4.86s/it][02-22 21:57:17] Loading scheduler from /root/.cache/huggingface/hub/models--Qwen--Qwen-Image/snapshots/75e0b4be04f60ec59a75f475837eced720f823b6/scheduler. avail mem: 36.64 GB
[02-22 21:57:17] Loaded scheduler: FlowMatchEulerDiscreteScheduler (sgl-diffusion version). model size: 0.00 GB, avail mem: 36.64 GB
Loading required modules: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:14<00:00,  2.84s/it]
[02-22 21:57:17] Creating pipeline stages...
[02-22 21:57:17] Defaulting to Torch SDPA backend on SM12.x
[02-22 21:57:17] Using Torch SDPA backend
[02-22 21:57:17] Pipeline instantiated
[02-22 21:57:17] Worker 0: Initialized device, model, and distributed environment.
[02-22 21:57:17] Worker 0: Scheduler loop started.
[02-22 21:57:17] Starting FastAPI server.
[2026-02-22 21:57:17] INFO:     Started server process [2190152]
[2026-02-22 21:57:17] INFO:     Waiting for application startup.
[02-22 21:57:17] ZMQ Broker is listening for offline jobs on tcp://*:10091
[2026-02-22 21:57:17] INFO:     Application startup complete.
[2026-02-22 21:57:17] INFO:     Uvicorn running on http://127.0.0.1:10090 (Press CTRL+C to quit)
[2026-02-22 21:57:17] INFO:     127.0.0.1:52256 - "GET /health HTTP/1.1" 200 OK
  [sglang-d-router] worker http://127.0.0.1:10092 is healthy (28s)
[sglang-d-router] registered http://127.0.0.1:10092
[2026-02-22 21:57:17] INFO:     127.0.0.1:41704 - "GET /health HTTP/1.1" 200 OK
  [sglang-d-router] worker http://127.0.0.1:10090 is healthy (28s)
[sglang-d-router] registered http://127.0.0.1:10090
[2026-02-22 21:57:19] INFO:     127.0.0.1:52160 - "GET /health HTTP/1.1" 200 OK
[2026-02-22 21:57:19] INFO:     127.0.0.1:42348 - "GET /health HTTP/1.1" 200 OK

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @dreamyang-liu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the SGLang diffusion routing system by integrating a robust worker launcher. It allows the router to not only manage incoming requests but also to dynamically spawn, monitor, and gracefully shut down its associated worker processes. This new capability streamlines the deployment and scaling of SGLang workers, particularly in local development environments, while laying the groundwork for more complex multi-node or containerized deployments through its extensible backend architecture.

Highlights

  • Integrated Worker Launcher: A new launcher module has been added, enabling the router CLI to directly spawn and manage SGLang worker processes using a YAML configuration file.
  • Extensible Backend Architecture: Introduced a LauncherBackend abstraction, currently with a LocalLauncher implementation for local subprocesses, designed to be extended for multi-node or Kubernetes deployments.
  • Managed Worker Lifecycle: The launcher manages workers through a three-phase lifecycle: launching processes, concurrently waiting for them to become healthy and registering them with the router, and gracefully shutting them down on router exit.
  • Configuration Management: Utilizes omegaconf for structured configuration loading and validation of launcher settings from YAML files, enhancing flexibility and maintainability.
Changelog
  • examples/local_launcher.yaml
    • Added a sample YAML configuration file for the local launcher, specifying model, worker counts, ports, and extra arguments.
  • pyproject.toml
    • Added omegaconf as a new dependency for structured configuration management.
  • src/sglang_diffusion_routing/cli/main.py
    • Modified the router's command-line interface to accept a --launcher-config argument.
    • Integrated the launcher backend for worker management.
    • Updated worker registration logic to use the new launcher mechanism.
  • src/sglang_diffusion_routing/launcher/init.py
    • Created the launcher package, exposing core classes and functions for backend implementation and configuration.
  • src/sglang_diffusion_routing/launcher/backend.py
    • Defined the LauncherBackend abstract base class, LaunchedWorker, and WorkerLaunchResult data structures, outlining the worker lifecycle.
  • src/sglang_diffusion_routing/launcher/config.py
    • Implemented functions to load and validate launcher configurations from YAML files using omegaconf.
    • Added functionality to instantiate the appropriate LauncherBackend based on configuration.
  • src/sglang_diffusion_routing/launcher/local.py
    • Provided the concrete implementation of LocalLauncher, handling the spawning of sglang serve processes.
    • Managed port allocation and GPU assignment for local workers.
    • Implemented process termination logic for local workers.
  • src/sglang_diffusion_routing/launcher/utils.py
    • Introduced utility functions for network port availability checks.
    • Added GPU detection and resolution logic.
    • Implemented health check polling for worker processes.
    • Provided graceful and forceful process termination utilities.
  • tests/unit/test_cli.py
    • Updated the CLI unit tests to reflect the new router initialization and worker registration flow, specifically checking the router object directly.
  • tests/unit/test_launcher.py
    • Added comprehensive unit tests for the new launcher subsystem.
    • Covered configuration loading, schema validation, backend creation, and local launcher functionality.
Activity
  • The pull request introduces a new feature to launch and manage SGLang worker processes directly from the router.
  • An example log demonstrates the router launching two workers, monitoring their health, and registering them upon readiness.
  • The log shows detailed output from worker startup, including model loading, device initialization, and FastAPI server activation.
  • Workers are health-checked and registered concurrently, with the router logging their status.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Comment thread examples/local_launcher.yaml
Comment thread src/sglang_diffusion_routing/cli/main.py Outdated
Comment thread src/sglang_diffusion_routing/cli/main.py
@alphabetc1
Copy link
Copy Markdown
Collaborator

BTW, could you provide an example in the README?

Comment thread examples/local_launcher.yaml Outdated
@alphabetc1
Copy link
Copy Markdown
Collaborator

overall, LGTM

Comment thread examples/local_launcher.yaml
Comment thread examples/local_launcher.yaml Outdated
Comment thread examples/local_launcher.yaml Outdated
Comment on lines +109 to +140
log_prefix = "[sglang-d-router]"
backend = None

try:
router = DiffusionRouter(args, verbose=args.verbose)

if args.launcher_config is not None:
launcher_cfg = _lcfg.load_launcher_config(args.launcher_config)
wait_timeout = launcher_cfg.wait_timeout
backend = _lcfg.create_backend(launcher_cfg)
backend.launch()
threading.Thread(
target=backend.wait_ready_and_register,
kwargs=dict(
register_fn=router.register_worker,
timeout=wait_timeout,
log_prefix=log_prefix,
),
daemon=True,
).start()

_run_router_server(args, router=router, log_prefix=log_prefix)
return 0
finally:
try:
asyncio.run(router.client.aclose())
except Exception:
pass
if backend is not None:
print(f"{log_prefix} shutting down managed workers...", flush=True)
backend.shutdown()
print(f"{log_prefix} all managed workers terminated.", flush=True)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a todo here to refactor. But we can leave it right now.

Comment on lines +162 to +164
master_port_base = 30005
scheduler_port_base = 5555
internal_port_stride = 1000
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These parameters are concerning. Should it be set as fixed, or could it be passed in?

@zhaochenyang20 zhaochenyang20 merged commit 426cd1e into zhaochenyang20:main Feb 23, 2026
1 check passed
@dreamyang-liu dreamyang-liu mentioned this pull request Feb 23, 2026
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Let router launching the server

3 participants