Skip to content

[Ming-Omni] Support diffusion based image generation #336

Open
yuan-luo wants to merge 6 commits intosgl-project:mainfrom
yuan-luo:support_diffusion
Open

[Ming-Omni] Support diffusion based image generation #336
yuan-luo wants to merge 6 commits intosgl-project:mainfrom
yuan-luo:support_diffusion

Conversation

@yuan-luo
Copy link
Copy Markdown
Collaborator

@yuan-luo yuan-luo commented Apr 23, 2026

Motivation

To close #257
Design Spec: #304

image

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

Checklist

  • Format your code according with pre-commit.
  • Add unit tests.
  • Update documentation / docstrings / example tutorials as needed.
  • Provide throughput / latency benchmark results and accuracy evaluation results as needed.
  • For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.

@yuan-luo
Copy link
Copy Markdown
Collaborator Author

yuan-luo commented Apr 23, 2026

python tests/test_production_image_gen_e2e.py --tp-size 4 --thinker-gpu 1 --diffusion-gpu cuda:5

2026-04-23 06:33:35,825 [INFO] __main__: === Phase 1: Loading SemanticConditioner on cuda:5 ===
2026-04-23 06:33:38,806 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] Reading config from
/data/cache/huggingface/hub/models--inclusionAI--Ming-flash-omni-2.0/snapshots/6a2e1dec07066d20f62a743ac7c34284e4a3932d/config.json
2026-04-23 06:33:38,806 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] Scales: [16], total tokens: 256
2026-04-23 06:33:39,556 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] Loading connector from
/data/cache/huggingface/hub/models--inclusionAI--Ming-flash-omni-2.0/snapshots/6a2e1dec07066d20f62a743ac7c34284e4a3932d/connector
`torch_dtype` is deprecated! Use `dtype` instead!

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:  50%|█████     | 1/2 [00:00<00:00,  2.57it/s]
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  4.19it/s]
2026-04-23 06:33:41,252 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] Connector loaded on cuda:5
2026-04-23 06:33:41,253 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] Loading projections from
/data/cache/huggingface/hub/models--inclusionAI--Ming-flash-omni-2.0/snapshots/6a2e1dec07066d20f62a743ac7c34284e4a3932d/mlp/model.safetensors
2026-04-23 06:33:41,296 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] Projections loaded: proj_in=[1536, 4096], proj_out=[2560, 1536], query_tokens=[256, 4096] (256 tokens)
2026-04-23 06:33:41,299 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] All components loaded on cuda:5 (~2.9 GB)
2026-04-23 06:33:41,299 [INFO] __main__: SemanticConditioner loaded in 2.5s
2026-04-23 06:33:41,299 [INFO] __main__:   query_tokens: [256, 4096], img_gen_scales: [16]
2026-04-23 06:33:41,299 [INFO] __main__: === Phase 2: Creating MingPreprocessor ===
2026-04-23 06:33:41,357 [INFO] qwen_vl_utils.vision_process: set VIDEO_TOTAL_PIXELS: 90316800
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'BailingTokenizer'.
The class this function is called from is 'PreTrainedTokenizerFast'.
2026-04-23 06:33:41,651 [INFO] __main__: Preprocessor created, image_patch_token_id=157157
2026-04-23 06:33:41,651 [INFO] __main__: === Phase 3: Running preprocessor on 2 prompts ===
2026-04-23 06:33:41,682 [INFO] __main__:   [0] input_ids: [1, 294], gen_mask sum=256, prefill_only=True
2026-04-23 06:33:41,703 [INFO] __main__:   [1] input_ids: [1, 292], gen_mask sum=256, prefill_only=True
2026-04-23 06:33:41,703 [INFO] __main__: === Phase 4: Loading SGLang thinker (TP=4, gpu=1, capture_hidden=True) ===
2026-04-23 06:33:42,745 [INFO] sglang_omni.models.ming_omni.pipeline.stages: create_sglang_thinker_executor_from_config: server_args_overrides={'tp_size': 4, 'base_gpu_id': 1}
2026-04-23 06:33:45,113 [INFO] sglang.srt.server_args: Attention backend not specified. Use fa3 backend by default.
2026-04-23 06:33:45,113 [INFO] sglang_omni.models.ming_omni.pipeline.stages: ServerArgs: cpu_offload_gb=0, mem_fraction_static=0.879, pre_load_avail_mem=103.21 GB
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'BailingTokenizer'.
The class this function is called from is 'PreTrainedTokenizerFast'.
2026-04-23 06:33:45,450 [INFO] sglang_omni.engines.tp.follower: Spawned follower rank 1 on GPU 2 (pid=101778)
2026-04-23 06:33:45,451 [INFO] sglang_omni.engines.tp.follower: Spawned follower rank 2 on GPU 3 (pid=101779)
2026-04-23 06:33:45,451 [INFO] sglang_omni.engines.tp.follower: Spawned follower rank 3 on GPU 4 (pid=101780)
2026-04-23 06:33:46,202 [WARNING] sglang.srt.models.registry: Ignore import error when loading sglang.srt.models.glmasr: cannot import name 'GlmAsrConfig' from 'transformers'
(/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/transformers/__init__.py)
2026-04-23 06:33:46,405 [INFO] sglang.srt.model_executor.model_runner: Init torch distributed begin.
2026-04-23 06:33:54,254 [INFO] tp_follower.3: Starting follower on GPU 4
2026-04-23 06:33:54,349 [INFO] tp_follower.2: Starting follower on GPU 3
2026-04-23 06:33:54,434 [INFO] tp_follower.1: Starting follower on GPU 2
2026-04-23 06:33:54,878 [WARNING] sglang.srt.models.registry: Ignore import error when loading sglang.srt.models.glmasr: cannot import name 'GlmAsrConfig' from 'transformers'
(/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/transformers/__init__.py)
2026-04-23 06:33:54,972 [WARNING] sglang.srt.models.registry: Ignore import error when loading sglang.srt.models.glmasr: cannot import name 'GlmAsrConfig' from 'transformers'
(/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/transformers/__init__.py)
[Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
2026-04-23 06:33:55,065 [WARNING] sglang.srt.models.registry: Ignore import error when loading sglang.srt.models.glmasr: cannot import name 'GlmAsrConfig' from 'transformers'
(/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/transformers/__init__.py)
2026-04-23 06:33:55,083 [INFO] qwen_vl_utils.vision_process: set VIDEO_TOTAL_PIXELS: 90316800
2026-04-23 06:33:55,093 [INFO] sglang.srt.model_executor.model_runner: Init torch distributed begin.
2026-04-23 06:33:55,172 [INFO] qwen_vl_utils.vision_process: set VIDEO_TOTAL_PIXELS: 90316800
2026-04-23 06:33:55,183 [INFO] sglang.srt.model_executor.model_runner: Init torch distributed begin.
2026-04-23 06:33:55,265 [INFO] qwen_vl_utils.vision_process: set VIDEO_TOTAL_PIXELS: 90316800
2026-04-23 06:33:55,276 [INFO] sglang.srt.model_executor.model_runner: Init torch distributed begin.
[Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
2026-04-23 06:33:55,313 [INFO] sglang.srt.distributed.device_communicators.pynccl: sglang is using nccl==2.27.5
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
[Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
2026-04-23 06:33:56,721 [INFO] sglang.srt.model_executor.model_runner: Init torch distributed ends. mem usage=1.00 GB
2026-04-23 06:33:56,721 [INFO] sglang.srt.model_executor.model_runner: Init torch distributed ends. mem usage=3.79 GB
2026-04-23 06:33:56,721 [INFO] sglang.srt.model_executor.model_runner: Init torch distributed ends. mem usage=3.97 GB
2026-04-23 06:33:56,721 [INFO] sglang.srt.model_executor.model_runner: Init torch distributed ends. mem usage=1.00 GB
2026-04-23 06:33:56,722 [INFO] sglang.srt.model_executor.model_runner: Load weight begin. avail mem=107.44 GB
2026-04-23 06:33:56,722 [INFO] sglang.srt.model_executor.model_runner: Load weight begin. avail mem=98.64 GB
2026-04-23 06:33:56,722 [INFO] sglang.srt.model_executor.model_runner: Load weight begin. avail mem=98.51 GB
2026-04-23 06:33:56,722 [INFO] sglang.srt.model_executor.model_runner: Load weight begin. avail mem=102.22 GB
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'BailingTokenizer'.
The class this function is called from is 'PreTrainedTokenizerFast'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'BailingTokenizer'.
The class this function is called from is 'PreTrainedTokenizerFast'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'BailingTokenizer'.
The class this function is called from is 'PreTrainedTokenizerFast'.
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'BailingTokenizer'.
The class this function is called from is 'PreTrainedTokenizerFast'.

Loading safetensors checkpoint shards:   0% Completed | 0/42 [00:00<?, ?it/s]

Loading safetensors checkpoint shards:  24% Completed | 10/42 [00:00<00:00, 95.27it/s]

Loading safetensors checkpoint shards:  48% Completed | 20/42 [00:00<00:00, 96.28it/s]

Loading safetensors checkpoint shards:  71% Completed | 30/42 [00:00<00:00, 93.03it/s]

Loading safetensors checkpoint shards:  95% Completed | 40/42 [00:00<00:00, 91.75it/s]

Loading safetensors checkpoint shards: 100% Completed | 42/42 [00:00<00:00, 83.39it/s]

2026-04-23 06:34:19,370 [INFO] sglang.srt.model_executor.model_runner: Load weight end. type=BailingMoeV2ForCausalLM, dtype=torch.bfloat16, avail mem=26.46 GB, mem usage=72.04 GB.
2026-04-23 06:34:19,458 [INFO] sglang.srt.model_executor.model_runner: Load weight end. type=BailingMoeV2ForCausalLM, dtype=torch.bfloat16, avail mem=26.56 GB, mem usage=72.08 GB.
2026-04-23 06:34:19,458 [INFO] sglang.srt.model_executor.model_runner: Load weight end. type=BailingMoeV2ForCausalLM, dtype=torch.bfloat16, avail mem=48.89 GB, mem usage=58.55 GB.
2026-04-23 06:34:19,461 [INFO] sglang.srt.model_executor.model_runner: Load weight end. type=BailingMoeV2ForCausalLM, dtype=torch.bfloat16, avail mem=53.49 GB, mem usage=48.72 GB.
2026-04-23 06:34:19,462 [INFO] sglang.srt.model_executor.model_runner: Using KV cache dtype: torch.bfloat16
2026-04-23 06:34:19,486 [INFO] sglang.srt.mem_cache.memory_pool: KV Cache is allocated. #tokens: 953147, K size: 7.27 GB, V size: 7.27 GB
2026-04-23 06:34:19,487 [INFO] sglang.srt.model_executor.model_runner_kv_cache_mixin: Memory pool end. avail mem=11.82 GB
2026-04-23 06:34:19,491 [INFO] sglang.srt.mem_cache.memory_pool: KV Cache is allocated. #tokens: 953147, K size: 7.27 GB, V size: 7.27 GB
2026-04-23 06:34:19,492 [INFO] sglang.srt.mem_cache.memory_pool: KV Cache is allocated. #tokens: 953147, K size: 7.27 GB, V size: 7.27 GB
2026-04-23 06:34:19,494 [INFO] sglang.srt.model_executor.model_runner_kv_cache_mixin: Memory pool end. avail mem=38.85 GB
2026-04-23 06:34:19,494 [INFO] sglang.srt.model_executor.model_runner_kv_cache_mixin: Memory pool end. avail mem=11.91 GB
2026-04-23 06:34:19,498 [INFO] sglang.srt.mem_cache.memory_pool: KV Cache is allocated. #tokens: 953147, K size: 7.27 GB, V size: 7.27 GB
2026-04-23 06:34:19,500 [INFO] sglang.srt.model_executor.model_runner_kv_cache_mixin: Memory pool end. avail mem=34.21 GB
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/utils/common.py:1232: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this
tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered
internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:206.)
 tensor_data = torch.ByteTensor(
2026-04-23 06:34:19,623 [INFO] tp_follower.3: ModelWorker initialized, NCCL group joined
2026-04-23 06:34:19,623 [INFO] tp_follower.2: ModelWorker initialized, NCCL group joined
2026-04-23 06:34:19,623 [INFO] tp_follower.1: ModelWorker initialized, NCCL group joined
2026-04-23 06:34:19,624 [INFO] sglang_omni.models.ming_omni.pipeline.stages: Ming thinker SGLang executor initialized: gpu_id=1 post_load_avail_mem=38.75 GB
2026-04-23 06:34:19,624 [INFO] __main__: SGLang thinker loaded in 36.9s
2026-04-23 06:34:19,624 [INFO] sglang_omni.engines.omni.engine: OmniEngine started (overlap=True)
2026-04-23 06:34:19,625 [INFO] __main__: SGLang thinker executor started
2026-04-23 06:34:19,625 [INFO] __main__: === Phase 5: Running thinker prefill-only for 2 prompts ===
2026-04-23 06:34:19,625 [INFO] __main__:   [0] Submitting prefill-only request...
2026-04-23 06:34:19,684 [INFO] sglang_omni.engines.ar.sglang_backend.scheduler.prefill: Chunked prefill scheduled: rid=img-0 projected=False extend_input_len=128
2026-04-23 06:34:22,504 [WARNING] sglang.srt.layers.moe.fused_moe_triton.fused_moe_triton_config: Using default MoE kernel config. Performance might be sub-optimal! Config file not found at
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H200.json, you can create them with
https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
2026-04-23 06:34:22,505 [WARNING] sglang.srt.layers.moe.fused_moe_triton.fused_moe_triton_config: Using MoE kernel config with down_moe=False. Performance might be sub-optimal! Config file not found at
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H200_down.json, you can create them with
https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
2026-04-23 06:34:22,549 [WARNING] sglang.srt.layers.moe.fused_moe_triton.fused_moe_triton_config: Using default MoE kernel config. Performance might be sub-optimal! Config file not found at
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H200.json, you can create them with
https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
2026-04-23 06:34:22,550 [WARNING] sglang.srt.layers.moe.fused_moe_triton.fused_moe_triton_config: Using MoE kernel config with down_moe=False. Performance might be sub-optimal! Config file not found at
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H200_down.json, you can create them with
https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
2026-04-23 06:34:22,551 [WARNING] sglang.srt.layers.moe.fused_moe_triton.fused_moe_triton_config: Using default MoE kernel config. Performance might be sub-optimal! Config file not found at
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H200.json, you can create them with
https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
2026-04-23 06:34:22,552 [WARNING] sglang.srt.layers.moe.fused_moe_triton.fused_moe_triton_config: Using MoE kernel config with down_moe=False. Performance might be sub-optimal! Config file not found at
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H200_down.json, you can create them with
https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
2026-04-23 06:34:22,554 [WARNING] sglang.srt.layers.moe.fused_moe_triton.fused_moe_triton_config: Using default MoE kernel config. Performance might be sub-optimal! Config file not found at
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H200.json, you can create them with
https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
2026-04-23 06:34:22,555 [WARNING] sglang.srt.layers.moe.fused_moe_triton.fused_moe_triton_config: Using MoE kernel config with down_moe=False. Performance might be sub-optimal! Config file not found at
/sgl-workspace/sglang-omni-dev/.venv/lib/python3.12/site-packages/sglang/srt/layers/moe/fused_moe_triton/configs/triton_3_5_1/E=256,N=256,device_name=NVIDIA_H200_down.json, you can create them with
https://github.com/sgl-project/sglang/tree/main/benchmark/kernels/fused_moe_triton
2026-04-23 06:34:22,721 [INFO] __main__:   [0] Thinker done in 3.10s, hidden_states: [294, 4096], output_ids: 1 tokens
2026-04-23 06:34:22,721 [INFO] __main__:   [1] Submitting prefill-only request...
2026-04-23 06:34:22,779 [INFO] sglang_omni.engines.ar.sglang_backend.scheduler.prefill: Chunked prefill scheduled: rid=img-1 projected=False extend_input_len=128
2026-04-23 06:34:23,002 [INFO] __main__:   [1] Thinker done in 0.28s, hidden_states: [272, 4096], output_ids: 1 tokens
2026-04-23 06:34:23,002 [INFO] __main__: Stopping thinker executor to free GPU memory...
2026-04-23 06:34:23,003 [INFO] tp_follower.2: Received stop signal after 6 steps
2026-04-23 06:34:23,003 [INFO] tp_follower.2: Follower exiting
2026-04-23 06:34:23,003 [INFO] tp_follower.3: Received stop signal after 6 steps
2026-04-23 06:34:23,003 [INFO] tp_follower.3: Follower exiting
2026-04-23 06:34:23,003 [INFO] tp_follower.1: Received stop signal after 6 steps
2026-04-23 06:34:23,003 [INFO] tp_follower.1: Follower exiting
2026-04-23 06:34:27,202 [INFO] sglang_omni.engines.omni.engine: OmniEngine stopped
2026-04-23 06:34:27,936 [INFO] __main__: GPU 0: 22.9 GiB free
2026-04-23 06:34:27,937 [INFO] __main__: GPU 1: 38.1 GiB free
2026-04-23 06:34:28,258 [INFO] __main__: GPU 2: 77.9 GiB free
2026-04-23 06:34:28,640 [INFO] __main__: GPU 3: 76.3 GiB free
2026-04-23 06:34:29,023 [INFO] __main__: GPU 4: 76.1 GiB free
2026-04-23 06:34:29,034 [INFO] __main__: GPU 5: 73.6 GiB free
2026-04-23 06:34:29,034 [INFO] __main__: === Phase 6: Projecting hidden states through conditioner ===
2026-04-23 06:34:29,034 [INFO] __main__:   [0] Hidden states tensor: shape=[294, 4096], dtype=torch.bfloat16
2026-04-23 06:34:29,034 [INFO] __main__:   [0] Query hidden: [1, 256, 4096] → projecting through conditioner
/sgl-workspace/sglang-omni-dev/sglang_omni/models/ming_omni/diffusion/semantic_conditioner.py:314: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
 with torch.cuda.amp.autocast(dtype=self._dtype):
2026-04-23 06:34:29,242 [INFO] __main__:   [0] Condition embeds: [1, 256, 2560], norm mean=1.0000 std=0.0000, time=0.207s
2026-04-23 06:34:29,243 [INFO] __main__:   [1] Hidden states tensor: shape=[272, 4096], dtype=torch.bfloat16
2026-04-23 06:34:29,243 [INFO] __main__:   [1] gen_mask len=292 > hs seq_len=272, trimming prefix (20 tokens cached)
2026-04-23 06:34:29,243 [INFO] __main__:   [1] Query hidden: [1, 256, 4096] → projecting through conditioner
2026-04-23 06:34:29,262 [INFO] __main__:   [1] Condition embeds: [1, 256, 2560], norm mean=1.0000 std=0.0000, time=0.019s
2026-04-23 06:34:29,263 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] Unloading components
2026-04-23 06:34:29,315 [INFO] sglang_omni.models.ming_omni.diffusion.semantic_conditioner: [SemanticConditioner] Unloaded, GPU cache cleared
2026-04-23 06:34:29,650 [INFO] __main__: === Phase 7: Loading ZImage pipeline on cuda:5 ===

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards:  20%|██        | 1/5 [00:00<00:00,  6.57it/s]
Loading checkpoint shards:  40%|████      | 2/5 [00:00<00:00,  6.62it/s]
Loading checkpoint shards:  60%|██████    | 3/5 [00:00<00:00,  5.97it/s]
Loading checkpoint shards:  80%|████████  | 4/5 [00:00<00:00,  5.79it/s]
Loading checkpoint shards: 100%|██████████| 5/5 [00:00<00:00,  7.14it/s]
2026-04-23 06:34:36,895 [INFO] __main__: ZImage pipeline loaded in 6.6s
2026-04-23 06:34:36,896 [INFO] __main__: === Phase 8: Generating images ===
2026-04-23 06:34:36,897 [INFO] __main__: Generating image 0: 'A cat sitting on a windowsill watching the sunset'

 0%|          | 0/28 [00:00<?, ?it/s]
 4%|▎         | 1/28 [00:00<00:12,  2.11it/s]
 7%|▋         | 2/28 [00:00<00:08,  2.90it/s]
11%|█         | 3/28 [00:01<00:09,  2.59it/s]
14%|█▍        | 4/28 [00:01<00:09,  2.46it/s]
18%|█▊        | 5/28 [00:02<00:09,  2.41it/s]
21%|██▏       | 6/28 [00:02<00:09,  2.37it/s]
25%|██▌       | 7/28 [00:02<00:08,  2.36it/s]
29%|██▊       | 8/28 [00:03<00:08,  2.35it/s]
32%|███▏      | 9/28 [00:03<00:08,  2.36it/s]
36%|███▌      | 10/28 [00:04<00:07,  2.35it/s]
39%|███▉      | 11/28 [00:04<00:07,  2.37it/s]
43%|████▎     | 12/28 [00:05<00:06,  2.38it/s]
46%|████▋     | 13/28 [00:05<00:06,  2.38it/s]
50%|█████     | 14/28 [00:05<00:05,  2.38it/s]
54%|█████▎    | 15/28 [00:06<00:05,  2.37it/s]
57%|█████▋    | 16/28 [00:06<00:05,  2.38it/s]
61%|██████    | 17/28 [00:07<00:04,  2.35it/s]
64%|██████▍   | 18/28 [00:07<00:04,  2.36it/s]
68%|██████▊   | 19/28 [00:07<00:03,  2.36it/s]
71%|███████▏  | 20/28 [00:08<00:03,  2.36it/s]
75%|███████▌  | 21/28 [00:08<00:02,  2.37it/s]
79%|███████▊  | 22/28 [00:09<00:02,  2.38it/s]
82%|████████▏ | 23/28 [00:09<00:02,  2.39it/s]
86%|████████▌ | 24/28 [00:10<00:01,  2.40it/s]
89%|████████▉ | 25/28 [00:10<00:01,  2.41it/s]
93%|█████████▎| 26/28 [00:10<00:00,  2.43it/s]
96%|█████████▋| 27/28 [00:11<00:00,  1.92it/s]
100%|██████████| 28/28 [00:12<00:00,  1.59it/s]
100%|██████████| 28/28 [00:12<00:00,  2.24it/s]
2026-04-23 06:34:50,549 [INFO] __main__:   [0] 1024x1024 in 13.2s, pixel mean=77.7 std=62.5 → /tmp/production_image_gen_e2e/prod_0_A_cat_sitting_on_a_w.png
2026-04-23 06:34:50,550 [INFO] __main__: Generating image 1: '一幅水墨画,画中有竹子和远山'

 0%|          | 0/28 [00:00<?, ?it/s]
 4%|▎         | 1/28 [00:00<00:24,  1.12it/s]
 7%|▋         | 2/28 [00:01<00:18,  1.44it/s]
11%|█         | 3/28 [00:02<00:19,  1.28it/s]
14%|█▍        | 4/28 [00:03<00:19,  1.25it/s]
18%|█▊        | 5/28 [00:04<00:19,  1.21it/s]
21%|██▏       | 6/28 [00:04<00:18,  1.18it/s]
25%|██▌       | 7/28 [00:05<00:17,  1.19it/s]
29%|██▊       | 8/28 [00:06<00:17,  1.17it/s]
32%|███▏      | 9/28 [00:07<00:16,  1.16it/s]
36%|███▌      | 10/28 [00:08<00:15,  1.15it/s]
39%|███▉      | 11/28 [00:09<00:14,  1.16it/s]
43%|████▎     | 12/28 [00:10<00:13,  1.15it/s]
46%|████▋     | 13/28 [00:11<00:13,  1.15it/s]
50%|█████     | 14/28 [00:11<00:12,  1.16it/s]
54%|█████▎    | 15/28 [00:12<00:11,  1.15it/s]
57%|█████▋    | 16/28 [00:13<00:10,  1.15it/s]
61%|██████    | 17/28 [00:14<00:09,  1.15it/s]
64%|██████▍   | 18/28 [00:15<00:08,  1.16it/s]
68%|██████▊   | 19/28 [00:16<00:07,  1.15it/s]
71%|███████▏  | 20/28 [00:17<00:06,  1.14it/s]
75%|███████▌  | 21/28 [00:17<00:06,  1.16it/s]
79%|███████▊  | 22/28 [00:18<00:05,  1.15it/s]
82%|████████▏ | 23/28 [00:19<00:04,  1.15it/s]
86%|████████▌ | 24/28 [00:20<00:03,  1.16it/s]
89%|████████▉ | 25/28 [00:21<00:02,  1.15it/s]
93%|█████████▎| 26/28 [00:22<00:01,  1.15it/s]
96%|█████████▋| 27/28 [00:23<00:00,  1.14it/s]
100%|██████████| 28/28 [00:24<00:00,  1.16it/s]
100%|██████████| 28/28 [00:24<00:00,  1.17it/s]
2026-04-23 06:35:15,507 [INFO] __main__:   [1] 1024x1024 in 24.6s, pixel mean=221.3 std=48.4 → /tmp/production_image_gen_e2e/prod_1_一幅水墨画画中有竹子和远山.png
2026-04-23 06:35:15,534 [INFO] __main__: ============================================================
2026-04-23 06:35:15,534 [INFO] __main__: === FINAL SUMMARY ===
2026-04-23 06:35:15,534 [INFO] __main__: SemanticConditioner load: 2.5s
2026-04-23 06:35:15,534 [INFO] __main__: SGLang thinker load (TP=4): 36.9s
2026-04-23 06:35:15,534 [INFO] __main__: ZImage pipeline load: 6.6s
2026-04-23 06:35:15,534 [INFO] __main__:   [OK] std= 62.5 mean= 77.7  A cat sitting on a windowsill watching the sunset
2026-04-23 06:35:15,535 [INFO] __main__:         → /tmp/production_image_gen_e2e/prod_0_A_cat_sitting_on_a_w.png
2026-04-23 06:35:15,535 [INFO] __main__:   [OK] std= 48.4 mean=221.3  一幅水墨画,画中有竹子和远山
2026-04-23 06:35:15,535 [INFO] __main__:         → /tmp/production_image_gen_e2e/prod_1_一幅水墨画画中有竹子和远山.png
2026-04-23 06:35:15,535 [INFO] __main__: ============================================================
2026-04-23 06:35:15,535 [INFO] __main__: Output dir: /tmp/production_image_gen_e2e
2026-04-23 06:35:15,535 [INFO] __main__: === TEST PASSED: Production image gen path works ===
img_1_cond_一幅水墨画画中有竹子和远山

@yuan-luo yuan-luo force-pushed the support_diffusion branch 5 times, most recently from d94640a to 0ee9153 Compare April 23, 2026 14:51
@yuan-luo yuan-luo force-pushed the support_diffusion branch from 0ee9153 to 6a9ce34 Compare April 23, 2026 16:05
@zhaochenyang20
Copy link
Copy Markdown
Collaborator

Sorry for posting this CI. Using the machine to calibration. You can relaunch later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Ming-Omni diffusion-based image generation

2 participants