[Ming-Omni] Support diffusion based image generation by yuan-luo · Pull Request #336 · sgl-project/sglang-omni

yuan-luo · 2026-04-23T06:43:05Z

Motivation

To close #257
Design Spec: #304

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Format your code according with pre-commit.
Add unit tests.
Update documentation / docstrings / example tutorials as needed.
Provide throughput / latency benchmark results and accuracy evaluation results as needed.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.

yuan-luo · 2026-04-23T06:48:21Z

python tests/test_production_image_gen_e2e.py --tp-size 4 --thinker-gpu 1 --diffusion-gpu cuda:5

2026-04-23 06:33:35,825 [INFO] __main__: === Phase 1: Loading SemanticConditioner on cuda:5 ===
2026-04-23 06:33:38,806 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] Reading config from
/data/cache/huggingface/hub/models--inclusionAI--Ming-flash-omni-2.0/snapshots/6a2e1dec07066d20f62a743ac7c34284e4a3932d/config.json
2026-04-23 06:33:38,806 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] Scales: [16], total tokens: 256
2026-04-23 06:33:39,556 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] Loading connector from
/data/cache/huggingface/hub/models--inclusionAI--Ming-flash-omni-2.0/snapshots/6a2e1dec07066d20f62a743ac7c34284e4a3932d/connector
`torch_dtype` is deprecated! Use `dtype` instead!

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|█████     | 1/2 [00:00<00:00,  2.57it/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  4.19it/s]
2026-04-23 06:33:41,252 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] Connector loaded on cuda:5
2026-04-23 06:33:41,253 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] Loading projections from
/data/cache/huggingface/hub/models--inclusionAI--Ming-flash-omni-2.0/snapshots/6a2e1dec07066d20f62a743ac7c34284e4a3932d/mlp/model.safetensors
2026-04-23 06:33:41,296 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] Projections loaded: proj_in=[1536, 4096], proj_out=[2560, 1536], query_tokens=[256, 4096] (256 tokens)
2026-04-23 06:33:41,299 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] All components loaded on cuda:5 (~2.9 GB)
2026-04-23 06:33:41,299 [INFO] __main__: SemanticConditioner loaded in 2.5s
2026-04-23 06:33:41,299 [INFO] __main__:   query_tokens: [256, 4096], img_gen_scales: [16]
2026-04-23 06:33:41,299 [INFO] __main__: === Phase 2: Creating MingPreprocessor ===
2026-04-23 06:33:41,357 [INFO] qwen_vl_utils.vision_process: set VIDEO_TOTAL_PIXELS: 90316800
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'BailingTokenizer'.
The class this function is called from is 'PreTrainedTokenizerFast'.
2026-04-23 06:33:41,651 [INFO] __main__: Preprocessor created, image_patch_token_id=157157
2026-04-23 06:33:41,651 [INFO] __main__: === Phase 3: Running preprocessor on 2 prompts ===
2026-04-23 06:33:41,682 [INFO] __main__:   [0] input_ids: [1, 294], gen_mask sum=256, prefill_only=True
2026-04-23 06:33:41,703 [INFO] __main__:   [1] input_ids: [1, 292], gen_mask sum=256, prefill_only=True
2026-04-23 06:33:41,703 [INFO] __main__: === Phase 4: Loading SGLang thinker (TP=4, gpu=1, capture_hidden=True) ===
2026-04-23 06:33:42,745 [INFO] sglang_omni.models.ming_omni.pipeline.stages: create_sglang_thinker_executor_from_config: server_args_overrides={'tp_size': 4, 'base_gpu_id': 1}
2026-04-23 06:33:45,113 [INFO] sglang.srt.server_args: Attention backend not specified. Use fa3 backend by default.
2026-04-23 06:33:45,113 [INFO] sglang_omni.models.ming_omni.pipeline.stages: ServerArgs: cpu_offload_gb=0, mem_fraction_static=0.879, pre_load_avail_mem=103.21 GB
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'BailingTokenizer'.
The class this function is called from is 'PreTrainedTokenizerFast'.
2026-04-23 06:33:45,450 [INFO] sglang_omni.engines.tp.follower: Spawned follower rank 1 on GPU 2 (pid=101778)
2026-04-23 06:33:45,451 [INFO] sglang_omni.engines.tp.follower: Spawned follower rank 2 on GPU 3 (pid=101779)
2026-04-23 06:33:45,451 [INFO] sglang_omni.engines.tp.follower: Spawned follower rank 3 on GPU 4 (pid=101780)
2026-04-23 06:33:46,202 [WARNING] sglang.srt.models.registry: Ignore import error when loading sglang.srt.models.glmasr: cannot import name 'GlmAsrConfig' from 'transformers'
(/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/transformers/__init__.py)
2026-04-23 06:33:46,405 [INFO] sglang.srt.model_executor.model_runner: Init torch distributed begin.
2026-04-23 06:33:54,254 [INFO] tp_follower.3: Starting follower on GPU 4
2026-04-23 06:33:54,349 [INFO] tp_follower.2: Starting follower on GPU 3
2026-04-23 06:33:54,434 [INFO] tp_follower.1: Starting follower on GPU 2
2026-04-23 06:33:54,878 [WARNING] sglang.srt.models.registry: Ignore import error when loading sglang.srt.models.glmasr: cannot import name 'GlmAsrConfig' from 'transformers'
(/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/transformers/__init__.py)
2026-04-23 06:33:54,972 [WARNING] sglang.srt.models.registry: Ignore import error when loading sglang.srt.models.glmasr: cannot import name 'GlmAsrConfig' from 'transformers'
(/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/transformers/__init__.py)
[Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
2026-04-23 06:33:55,065 [WARNING] sglang.srt.models.registry: Ignore import error when loading sglang.srt.models.glmasr: cannot import name 'GlmAsrConfig' from 'transformers'
(/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/transformers/__init__.py)
2026-04-23 06:33:55,083 [INFO] qwen_vl_utils.vision_process: set VIDEO_TOTAL_PIXELS: 90316800
2026-04-23 06:33:55,093 [INFO] sglang.srt.model_executor.model_runner: Init torch distributed begin.
2026-04-23 06:33:55,172 [INFO] qwen_vl_utils.vision_process: set VIDEO_TOTAL_PIXELS: 90316800
2026-04-23 06:33:55,183 [INFO] sglang.srt.model_executor.model_runner: Init torch distributed begin.
2026-04-23 06:33:55,265 [INFO] qwen_vl_utils.vision_process: set VIDEO_TOTAL_PIXELS: 90316800
2026-04-23 06:33:55,276 [INFO] sglang.srt.model_executor.model_runner: Init torch distributed begin.
[Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
2026-04-23 06:33:55,313 [INFO] sglang.srt.distributed.device_communicators.pynccl: sglang is using nccl==2.27.5
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
2026-04-23 06:33:56,721 [INFO] sglang.srt.model_executor.model_runner: Init torch distributed ends. mem usage=1.00 GB
2026-04-23 06:33:56,721 [INFO] sglang.srt.model_executor.model_runner: Init torch distributed ends. mem usage=3.79 GB
2026-04-23 06:33:56,721 [INFO] sglang.srt.model_executor.model_runner: Init torch distributed ends. mem usage=3.97 GB
2026-04-23 06:33:56,721 [INFO] sglang.srt.model_executor.model_runner: Init torch distributed ends. mem usage=1.00 GB
2026-04-23 06:33:56,722 [INFO] sglang.srt.model_executor.model_runner: Load weight begin. avail mem=107.44 GB
2026-04-23 06:33:56,722 [INFO] sglang.srt.model_executor.model_runner: Load weight begin. avail mem=98.64 GB
2026-04-23 06:33:56,722 [INFO] sglang.srt.model_executor.model_runner: Load weight begin. avail mem=98.51 GB
2026-04-23 06:33:56,722 [INFO] sglang.srt.model_executor.model_runner: Load weight begin. avail mem=102.22 GB
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'BailingTokenizer'.
The class this function is called from is 'PreTrainedTokenizerFast'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'BailingTokenizer'.
The class this function is called from is 'PreTrainedTokenizerFast'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'BailingTokenizer'.
The class this function is called from is 'PreTrainedTokenizerFast'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'BailingTokenizer'.
The class this function is called from is 'PreTrainedTokenizerFast'.

Loading safetensors checkpoint shards:   0% Completed | 0/42 [00:00<?, ?it/s]

Loading safetensors checkpoint shards:  24% Completed | 10/42 [00:00<00:00, 95.27it/s]

Loading safetensors checkpoint shards:  48% Completed | 20/42 [00:00<00:00, 96.28it/s]

Loading safetensors checkpoint shards:  71% Completed | 30/42 [00:00<00:00, 93.03it/s]

Loading safetensors checkpoint shards:  95% Completed | 40/42 [00:00<00:00, 91.75it/s]

Loading safetensors checkpoint shards: 100% Completed | 42/42 [00:00<00:00, 83.39it/s]

2026-04-23 06:34:19,370 [INFO] sglang.srt.model_executor.model_runner: Load weight end. type=BailingMoeV2ForCausalLM, dtype=torch.bfloat16, avail mem=26.46 GB, mem usage=72.04 GB.
2026-04-23 06:34:19,458 [INFO] sglang.srt.model_executor.model_runner: Load weight end. type=BailingMoeV2ForCausalLM, dtype=torch.bfloat16, avail mem=26.56 GB, mem usage=72.08 GB.
2026-04-23 06:34:19,458 [INFO] sglang.srt.model_executor.model_runner: Load weight end. type=BailingMoeV2ForCausalLM, dtype=torch.bfloat16, avail mem=48.89 GB, mem usage=58.55 GB.
2026-04-23 06:34:19,461 [INFO] sglang.srt.model_executor.model_runner: Load weight end. type=BailingMoeV2ForCausalLM, dtype=torch.bfloat16, avail mem=53.49 GB, mem usage=48.72 GB.
2026-04-23 06:34:19,462 [INFO] sglang.srt.model_executor.model_runner: Using KV cache dtype: torch.bfloat16
2026-04-23 06:34:19,486 [INFO] sglang.srt.mem_cache.memory_pool: KV Cache is allocated. #tokens: 953147, K size: 7.27 GB, V size: 7.27 GB
2026-04-23 06:34:19,487 [INFO] sglang.srt.model_executor.model_runner_kv_cache_mixin: Memory pool end. avail mem=11.82 GB
2026-04-23 06:34:19,491 [INFO] sglang.srt.mem_cache.memory_pool: KV Cache is allocated. #tokens: 953147, K size: 7.27 GB, V size: 7.27 GB
2026-04-23 06:34:19,492 [INFO] sglang.srt.mem_cache.memory_pool: KV Cache is allocated. #tokens: 953147, K size: 7.27 GB, V size: 7.27 GB
2026-04-23 06:34:19,494 [INFO] sglang.srt.model_executor.model_runner_kv_cache_mixin: Memory pool end. avail mem=38.85 GB
2026-04-23 06:34:19,494 [INFO] sglang.srt.model_executor.model_runner_kv_cache_mixin: Memory pool end. avail mem=11.91 GB
2026-04-23 06:34:19,498 [INFO] sglang.srt.mem_cache.memory_pool: KV Cache is allocated. #tokens: 953147, K size: 7.27 GB, V size: 7.27 GB
2026-04-23 06:34:19,500 [INFO] sglang.srt.model_executor.model_runner_kv_cache_mixin: Memory pool end. avail mem=34.21 GB
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/utils/common.py:1232: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this
tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered
internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:206.)
 tensor_data = torch.ByteTensor(
2026-04-23 06:34:19,623 [INFO] tp_follower.3: ModelWorker initialized, NCCL group joined
2026-04-23 06:34:19,623 [INFO] tp_follower.2: ModelWorker initialized, NCCL group joined
2026-04-23 06:34:19,623 [INFO] tp_follower.1: ModelWorker initialized, NCCL group joined
2026-04-23 06:34:19,624 [INFO] sglang_omni.models.ming_omni.pipeline.stages: Ming thinker SGLang executor initialized: gpu_id=1 post_load_avail_mem=38.75 GB
2026-04-23 06:34:19,624 [INFO] __main__: SGLang thinker loaded in 36.9s
2026-04-23 06:34:19,624 [INFO] sglang_omni.engines.omni.engine: OmniEngine started (overlap=True)
2026-04-23 06:34:19,625 [INFO] __main__: SGLang thinker executor started
2026-04-23 06:34:19,625 [INFO] __main__: === Phase 5: Running thinker prefill-only for 2 prompts ===
2026-04-23 06:34:19,625 [INFO] __main__:   [0] Submitting prefill-only request...
2026-04-23 06:34:19,684 [INFO] sglang_omni.engines.ar.sglang_backend.scheduler.prefill: Chunked prefill scheduled: rid=img-0 projected=False extend_input_len=128
2026-04-23 06:34:22,504 [WARNING] sglang.srt.layers.moe.fused_moe_triton.fused_moe_triton_config: Using default MoE kernel config. Performance might be sub-optimal! Config file not found at
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H200.json, you can create them with
https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
2026-04-23 06:34:22,505 [WARNING] sglang.srt.layers.moe.fused_moe_triton.fused_moe_triton_config: Using MoE kernel config with down_moe=False. Performance might be sub-optimal! Config file not found at
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H200_down.json, you can create them with
https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
2026-04-23 06:34:22,549 [WARNING] sglang.srt.layers.moe.fused_moe_triton.fused_moe_triton_config: Using default MoE kernel config. Performance might be sub-optimal! Config file not found at
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H200.json, you can create them with
https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
2026-04-23 06:34:22,550 [WARNING] sglang.srt.layers.moe.fused_moe_triton.fused_moe_triton_config: Using MoE kernel config with down_moe=False. Performance might be sub-optimal! Config file not found at
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H200_down.json, you can create them with
https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
2026-04-23 06:34:22,551 [WARNING] sglang.srt.layers.moe.fused_moe_triton.fused_moe_triton_config: Using default MoE kernel config. Performance might be sub-optimal! Config file not found at
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H200.json, you can create them with
https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
2026-04-23 06:34:22,552 [WARNING] sglang.srt.layers.moe.fused_moe_triton.fused_moe_triton_config: Using MoE kernel config with down_moe=False. Performance might be sub-optimal! Config file not found at
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H200_down.json, you can create them with
https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
2026-04-23 06:34:22,554 [WARNING] sglang.srt.layers.moe.fused_moe_triton.fused_moe_triton_config: Using default MoE kernel config. Performance might be sub-optimal! Config file not found at
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H200.json, you can create them with
https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
2026-04-23 06:34:22,555 [WARNING] sglang.srt.layers.moe.fused_moe_triton.fused_moe_triton_config: Using MoE kernel config with down_moe=False. Performance might be sub-optimal! Config file not found at
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H200_down.json, you can create them with
https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
2026-04-23 06:34:22,721 [INFO] __main__:   [0] Thinker done in 3.10s, hidden_states: [294, 4096], output_ids: 1 tokens
2026-04-23 06:34:22,721 [INFO] __main__:   [1] Submitting prefill-only request...
2026-04-23 06:34:22,779 [INFO] sglang_omni.engines.ar.sglang_backend.scheduler.prefill: Chunked prefill scheduled: rid=img-1 projected=False extend_input_len=128
2026-04-23 06:34:23,002 [INFO] __main__:   [1] Thinker done in 0.28s, hidden_states: [272, 4096], output_ids: 1 tokens
2026-04-23 06:34:23,002 [INFO] __main__: Stopping thinker executor to free GPU memory...
2026-04-23 06:34:23,003 [INFO] tp_follower.2: Received stop signal after 6 steps
2026-04-23 06:34:23,003 [INFO] tp_follower.2: Follower exiting
2026-04-23 06:34:23,003 [INFO] tp_follower.3: Received stop signal after 6 steps
2026-04-23 06:34:23,003 [INFO] tp_follower.3: Follower exiting
2026-04-23 06:34:23,003 [INFO] tp_follower.1: Received stop signal after 6 steps
2026-04-23 06:34:23,003 [INFO] tp_follower.1: Follower exiting
2026-04-23 06:34:27,202 [INFO] sglang_omni.engines.omni.engine: OmniEngine stopped
2026-04-23 06:34:27,936 [INFO] __main__: GPU 0: 22.9 GiB free
2026-04-23 06:34:27,937 [INFO] __main__: GPU 1: 38.1 GiB free
2026-04-23 06:34:28,258 [INFO] __main__: GPU 2: 77.9 GiB free
2026-04-23 06:34:28,640 [INFO] __main__: GPU 3: 76.3 GiB free
2026-04-23 06:34:29,023 [INFO] __main__: GPU 4: 76.1 GiB free
2026-04-23 06:34:29,034 [INFO] __main__: GPU 5: 73.6 GiB free
2026-04-23 06:34:29,034 [INFO] __main__: === Phase 6: Projecting hidden states through conditioner ===
2026-04-23 06:34:29,034 [INFO] __main__:   [0] Hidden states tensor: shape=[294, 4096], dtype=torch.bfloat16
2026-04-23 06:34:29,034 [INFO] __main__:   [0] Query hidden: [1, 256, 4096] → projecting through conditioner
/sgl-workspace/sglang-omni-dev/sglang_omni/models/ming_omni/diffusion/semantic_conditioner.py:314: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
 with torch.cuda.amp.autocast(dtype=self._dtype):
2026-04-23 06:34:29,242 [INFO] __main__:   [0] Condition embeds: [1, 256, 2560], norm mean=1.0000 std=0.0000, time=0.207s
2026-04-23 06:34:29,243 [INFO] __main__:   [1] Hidden states tensor: shape=[272, 4096], dtype=torch.bfloat16
2026-04-23 06:34:29,243 [INFO] __main__:   [1] gen_mask len=292 > hs seq_len=272, trimming prefix (20 tokens cached)
2026-04-23 06:34:29,243 [INFO] __main__:   [1] Query hidden: [1, 256, 4096] → projecting through conditioner
2026-04-23 06:34:29,262 [INFO] __main__:   [1] Condition embeds: [1, 256, 2560], norm mean=1.0000 std=0.0000, time=0.019s
2026-04-23 06:34:29,263 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] Unloading components
2026-04-23 06:34:29,315 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] Unloaded, GPU cache cleared
2026-04-23 06:34:29,650 [INFO] __main__: === Phase 7: Loading ZImage pipeline on cuda:5 ===

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards:  20%|██        | 1/5 [00:00<00:00,  6.57it/s]
Loading checkpoint shards:  40%|████      | 2/5 [00:00<00:00,  6.62it/s]
Loading checkpoint shards:  60%|██████    | 3/5 [00:00<00:00,  5.97it/s]
Loading checkpoint shards:  80%|████████  | 4/5 [00:00<00:00,  5.79it/s]
Loading checkpoint shards: 100%|██████████| 5/5 [00:00<00:00,  7.14it/s]
2026-04-23 06:34:36,895 [INFO] __main__: ZImage pipeline loaded in 6.6s
2026-04-23 06:34:36,896 [INFO] __main__: === Phase 8: Generating images ===
2026-04-23 06:34:36,897 [INFO] __main__: Generating image 0: 'A cat sitting on a windowsill watching the sunset'

 0%|          | 0/28 [00:00<?, ?it/s]
 4%|▎         | 1/28 [00:00<00:12,  2.11it/s]
 7%|▋         | 2/28 [00:00<00:08,  2.90it/s]
11%|█         | 3/28 [00:01<00:09,  2.59it/s]
14%|█▍        | 4/28 [00:01<00:09,  2.46it/s]
18%|█▊        | 5/28 [00:02<00:09,  2.41it/s]
21%|██▏       | 6/28 [00:02<00:09,  2.37it/s]
25%|██▌       | 7/28 [00:02<00:08,  2.36it/s]
29%|██▊       | 8/28 [00:03<00:08,  2.35it/s]
32%|███▏      | 9/28 [00:03<00:08,  2.36it/s]
36%|███▌      | 10/28 [00:04<00:07,  2.35it/s]
39%|███▉      | 11/28 [00:04<00:07,  2.37it/s]
43%|████▎     | 12/28 [00:05<00:06,  2.38it/s]
46%|████▋     | 13/28 [00:05<00:06,  2.38it/s]
50%|█████     | 14/28 [00:05<00:05,  2.38it/s]
54%|█████▎    | 15/28 [00:06<00:05,  2.37it/s]
57%|█████▋    | 16/28 [00:06<00:05,  2.38it/s]
61%|██████    | 17/28 [00:07<00:04,  2.35it/s]
64%|██████▍   | 18/28 [00:07<00:04,  2.36it/s]
68%|██████▊   | 19/28 [00:07<00:03,  2.36it/s]
71%|███████▏  | 20/28 [00:08<00:03,  2.36it/s]
75%|███████▌  | 21/28 [00:08<00:02,  2.37it/s]
79%|███████▊  | 22/28 [00:09<00:02,  2.38it/s]
82%|████████▏ | 23/28 [00:09<00:02,  2.39it/s]
86%|████████▌ | 24/28 [00:10<00:01,  2.40it/s]
89%|████████▉ | 25/28 [00:10<00:01,  2.41it/s]
93%|█████████▎| 26/28 [00:10<00:00,  2.43it/s]
96%|█████████▋| 27/28 [00:11<00:00,  1.92it/s]
100%|██████████| 28/28 [00:12<00:00,  1.59it/s]
100%|██████████| 28/28 [00:12<00:00,  2.24it/s]
2026-04-23 06:34:50,549 [INFO] __main__:   [0] 1024x1024 in 13.2s, pixel mean=77.7 std=62.5 → /tmp/production_image_gen_e2e/prod_0_A_cat_sitting_on_a_w.png
2026-04-23 06:34:50,550 [INFO] __main__: Generating image 1: '一幅水墨画，画中有竹子和远山'

 0%|          | 0/28 [00:00<?, ?it/s]
 4%|▎         | 1/28 [00:00<00:24,  1.12it/s]
 7%|▋         | 2/28 [00:01<00:18,  1.44it/s]
11%|█         | 3/28 [00:02<00:19,  1.28it/s]
14%|█▍        | 4/28 [00:03<00:19,  1.25it/s]
18%|█▊        | 5/28 [00:04<00:19,  1.21it/s]
21%|██▏       | 6/28 [00:04<00:18,  1.18it/s]
25%|██▌       | 7/28 [00:05<00:17,  1.19it/s]
29%|██▊       | 8/28 [00:06<00:17,  1.17it/s]
32%|███▏      | 9/28 [00:07<00:16,  1.16it/s]
36%|███▌      | 10/28 [00:08<00:15,  1.15it/s]
39%|███▉      | 11/28 [00:09<00:14,  1.16it/s]
43%|████▎     | 12/28 [00:10<00:13,  1.15it/s]
46%|████▋     | 13/28 [00:11<00:13,  1.15it/s]
50%|█████     | 14/28 [00:11<00:12,  1.16it/s]
54%|█████▎    | 15/28 [00:12<00:11,  1.15it/s]
57%|█████▋    | 16/28 [00:13<00:10,  1.15it/s]
61%|██████    | 17/28 [00:14<00:09,  1.15it/s]
64%|██████▍   | 18/28 [00:15<00:08,  1.16it/s]
68%|██████▊   | 19/28 [00:16<00:07,  1.15it/s]
71%|███████▏  | 20/28 [00:17<00:06,  1.14it/s]
75%|███████▌  | 21/28 [00:17<00:06,  1.16it/s]
79%|███████▊  | 22/28 [00:18<00:05,  1.15it/s]
82%|████████▏ | 23/28 [00:19<00:04,  1.15it/s]
86%|████████▌ | 24/28 [00:20<00:03,  1.16it/s]
89%|████████▉ | 25/28 [00:21<00:02,  1.15it/s]
93%|█████████▎| 26/28 [00:22<00:01,  1.15it/s]
96%|█████████▋| 27/28 [00:23<00:00,  1.14it/s]
100%|██████████| 28/28 [00:24<00:00,  1.16it/s]
100%|██████████| 28/28 [00:24<00:00,  1.17it/s]
2026-04-23 06:35:15,507 [INFO] __main__:   [1] 1024x1024 in 24.6s, pixel mean=221.3 std=48.4 → /tmp/production_image_gen_e2e/prod_1_一幅水墨画画中有竹子和远山.png
2026-04-23 06:35:15,534 [INFO] __main__: ============================================================
2026-04-23 06:35:15,534 [INFO] __main__: === FINAL SUMMARY ===
2026-04-23 06:35:15,534 [INFO] __main__: SemanticConditioner load: 2.5s
2026-04-23 06:35:15,534 [INFO] __main__: SGLang thinker load (TP=4): 36.9s
2026-04-23 06:35:15,534 [INFO] __main__: ZImage pipeline load: 6.6s
2026-04-23 06:35:15,534 [INFO] __main__:   [OK] std= 62.5 mean= 77.7  A cat sitting on a windowsill watching the sunset
2026-04-23 06:35:15,535 [INFO] __main__:         → /tmp/production_image_gen_e2e/prod_0_A_cat_sitting_on_a_w.png
2026-04-23 06:35:15,535 [INFO] __main__:   [OK] std= 48.4 mean=221.3  一幅水墨画，画中有竹子和远山
2026-04-23 06:35:15,535 [INFO] __main__:         → /tmp/production_image_gen_e2e/prod_1_一幅水墨画画中有竹子和远山.png
2026-04-23 06:35:15,535 [INFO] __main__: ============================================================
2026-04-23 06:35:15,535 [INFO] __main__: Output dir: /tmp/production_image_gen_e2e
2026-04-23 06:35:15,535 [INFO] __main__: === TEST PASSED: Production image gen path works ===

zhaochenyang20 · 2026-04-24T04:37:15Z

Sorry for posting this CI. Using the machine to calibration. You can relaunch later

luoyuan.luo added 4 commits April 23, 2026 10:16

Support image generator diffusion

c70836c

Fix image gen e2e issue

aab4e25

Support image + byt5 mix gen

a169243

Refactor semantic conditioner to reuse thinker in large scale

72cfd51

yuan-luo requested review from FrankLeeeee and shuaills as code owners April 23, 2026 06:43

yuan-luo force-pushed the support_diffusion branch from 00feba1 to b2634a5 Compare April 23, 2026 06:44

yuan-luo force-pushed the support_diffusion branch 5 times, most recently from d94640a to 0ee9153 Compare April 23, 2026 14:51

Refactor sglang omni framework to support e2e diffusion

6a9ce34

yuan-luo force-pushed the support_diffusion branch from 0ee9153 to 6a9ce34 Compare April 23, 2026 16:05

skip image gen e2e test as model too large

5a89eb3

yuan-luo mentioned this pull request May 4, 2026

[Feature] Ming-Omni support sequence parallelism for image generation #343

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ming-Omni] Support diffusion based image generation #336

[Ming-Omni] Support diffusion based image generation #336
yuan-luo wants to merge 6 commits intosgl-project:mainfrom
yuan-luo:support_diffusion

yuan-luo commented Apr 23, 2026 •

edited

Loading

Uh oh!

yuan-luo commented Apr 23, 2026 •

edited

Loading

Uh oh!

zhaochenyang20 commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yuan-luo commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

Uh oh!

yuan-luo commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhaochenyang20 commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuan-luo commented Apr 23, 2026 •

edited

Loading

yuan-luo commented Apr 23, 2026 •

edited

Loading