The xDiT Unified Runner provides a single entry point for running all supported diffusion models with proper benchmarking and profiling support.
The unified runner provides:
- Single CLI interface for all supported models
- Programmatic API for integration into custom code
- Built-in benchmarking with timing measurements
- Profiling support via PyTorch profiler
- Automatic validation of model capabilities and arguments
- Parallelization across all supported models
Run any supported model using xdit:
xdit --model FLUX.1-dev \
--prompt "A cat running in a garden" \
--ulysses_degree 8This will generate an image with Flux.1-dev and uses the model-specific values for any parameters that were not provided.
The unified runner consists of three main components:
The main entry point that users interact with. It handles:
- Argument parsing and validation
- Model selection from the registry
- Execution flow (initialization → run/profile → save → cleanup)
# api_example.py
# Usage: torchrun --nproc_per_node=4 api_example.py
from xfuser.runner import xFuserModelRunner
# Programmatic usage
config = {
"model": "FLUX.1-dev",
"prompt": "A cat running",
"ulysses_degree": 4,
}
runner = xFuserModelRunner(config)
input_args = runner.preprocess_args(config)
runner.initialize(input_args)
output, timings = runner.run(input_args)
runner.save(output=output, timings=timings)
runner.cleanup()Contains all shared logic for model operations, e.g:
- Model loading and initialization
- Benchmarking and timing
- Profiling with PyTorch profiler
- Output saving
- Torch compilation
- Warmup calls
- All other generic features
Individual model classes that inherit from xFuserModel:
- Define model-specific loading logic
- Implement the inference pipeline
- Specify default values and capabilities
- Override base methods when needed for custom features
| Model | Valid Model Name(s) |
|---|---|
| FLUX.1-dev | FLUX.1-dev, black-forest-labs/FLUX.1-dev |
| FLUX.1-Kontext | FLUX.1-Kontext-dev, black-forest-labs/FLUX.1-Kontext-dev |
| FLUX.2 | FLUX.2-dev, black-forest-labs/FLUX.2-dev |
| FLUX.2-klein | FLUX.2-klein-9B, black-forest-labs/FLUX.2-klein-9B |
| HunyuanVideo | HunyuanVideo, tencent/HunyuanVideo |
| HunyuanVideo-1.5 | HunyuanVideo-1.5, tencent/HunyuanVideo-1.5 |
| Wan 2.1/2.2 I2V | Wan2.1-I2V, Wan2.2-I2V, Wan-AI/Wan2.1-I2V-14B-720P-Diffusers, Wan-AI/Wan2.2-I2V-A14B-Diffusers |
| Wan 2.1/2.2 T2V | Wan2.1-T2V, Wan2.2-T2V, Wan-AI/Wan2.1-T2V-14B-720P-Diffusers, Wan-AI/Wan2.2-T2V-A14B-Diffusers |
| Wan 2.1 VACE | Wan2.1-VACE-14B, Wan2.1-VACE-1.3B, Wan-AI/Wan2.1-VACE-14B, Wan-AI/Wan2.1-VACE-1.3B |
| Stable Diffusion 3 | SD3.5, stabilityai/stable-diffusion-3.5-large |
| Z-Image-Turbo | Z-Image-Turbo, Tongyi-MAI/Z-Image-Turbo |
| LTX-2 | LTX-2, Lightricks/LTX-2 |
| Qwen-Image | Qwen-Image, Qwen/Qwen-Image, Qwen-Image-2512, Qwen/Qwen-Image-2512 |
| Qwen-Image-Edit | Qwen-Image-Edit, Qwen/Qwen-Image-Edit, Qwen-Image-Edit-2509, Qwen/Qwen-Image-Edit-2509, Qwen-Image-Edit-2511, Qwen/Qwen-Image-Edit-2511 |
| Argument | Description |
|---|---|
--model |
Model name or HuggingFace path (required) |
--task |
Task type for multi-task models |
| Argument | Description | Default |
|---|---|---|
--ulysses_degree |
Ulysses sequence parallel degree | 1 |
--ring_degree |
Ring sequence parallel degree | 1 |
--pipefusion_parallel_degree |
PipeFusion pipeline stages | 1 |
--tensor_parallel_degree |
Tensor parallel degree | 1 |
--data_parallel_degree |
Data parallel degree | 1 |
--use_cfg_parallel |
Enable CFG parallel | False |
--use_parallel_vae |
Enable parallel VAE | False |
--use_fsdp |
Enable FSDP | False |
| Argument | Description | Default |
|---|---|---|
--prompt |
Text prompt(s) for generation | - |
--negative_prompt |
Negative prompt(s) | - |
--height |
Output height | Model-specific |
--width |
Output width | Model-specific |
--num_frames |
Number of frames for video models | Model-specific |
--num_inference_steps |
Denoising steps | Model-specific |
--guidance_scale |
Classifier-free guidance scale | Model-specific |
--max_sequence_length |
Maximum sequence length | Model-specific |
--seed |
Random seed for reproducibility | 42 |
--input_images |
Input image paths for image-to-image/video | [] |
| Argument | Description | Default |
|---|---|---|
--use_torch_compile |
Enable torch.compile acceleration | False |
--use_fp8_gemms |
Enable FP8 GEMM quantization | False |
--enable_tiling |
Enable VAE tiling | False |
--enable_slicing |
Enable VAE slicing | False |
--enable_model_cpu_offload |
Enable model CPU offload | False |
--enable_sequential_cpu_offload |
Enable sequential CPU offload | False |
--attention_backend |
Attention backend selection | None |
| Argument | Description | Default |
|---|---|---|
--num_iterations |
Number of benchmark iterations | 1 |
--warmup_calls |
Warmup iterations before timing | 0 |
--batch_size |
Batch size for dataset inference | None |
--dataset_path |
Path to prompt dataset csv | None |
--output_directory |
Output save directory | . |
| Argument | Description | Default |
|---|---|---|
--profile |
Enable PyTorch profiler | False |
--profile_wait |
Profiler wait steps | 2 |
--profile_warmup |
Profiler warmup steps | 2 |
--profile_active |
Profiler active steps | 1 |
xdit --model FLUX.1-dev \
--prompt "A majestic mountain landscape at sunset" \
--height 1024 \
--width 1024 \
--ulysses_degree 4 \
--num_inference_steps 50xdit --model HunyuanVideo \
--prompt "A cat playing with a ball" \
--height 720 \
--width 1280 \
--num_frames 49 \
--ulysses_degree 8xdit --model FLUX.1-dev \
--prompt "Benchmark test image" \
--ulysses_degree 8 \
--num_iterations 5 \
--output_directory ./benchmark_resultsxdit --model FLUX.1-dev \
--prompt "Profile test" \
--ulysses_degree 8 \
--profile \
--output_directory ./profile_resultsxdit --model FLUX.1-dev \
--prompt "Compiled inference test" \
--ulysses_degree 4 \
--use_torch_compilexdit --model FLUX.1-dev \
--dataset_path ./prompts.csv \ # CSV file with at least column "prompt"
--batch_size 4 \
--ulysses_degree 8 \
--output_directory ./dataset_outputsThe runner can be imported and used programmatically:
from xfuser.runner import xFuserModelRunner
# Configuration dictionary
config = {
"model": "FLUX.1-dev",
"prompt": "A beautiful garden with flowers",
"height": 1024,
"width": 1024,
"ulysses_degree": 4,
"num_inference_steps": 50,
"seed": 42,
"output_directory": "./outputs",
}
# Create runner
runner = xFuserModelRunner(config)
# Preprocess arguments (applies model defaults)
input_args = runner.preprocess_args(config)
# Initialize model
runner.initialize(input_args)
# Run inference
output, timings = runner.run(input_args)
# Save outputs
runner.save(output=output, timings=timings)
# Cleanup
runner.cleanup()runner = xFuserModelRunner(config)
input_args = runner.preprocess_args(config)
runner.initialize(input_args)
# Profile instead of run
output, timings, profile = runner.profile(input_args)
runner.save(profile=profile)
runner.cleanup()The runner saves outputs to the specified --output_directory:
| File | Description |
|---|---|
{model}_u{ulysses}r{ring}_tc_{compile}_{height}x{width}_{index}.png |
Generated images |
{model}_u{ulysses}r{ring}_tc_{compile}_{height}x{width}_{index}.mp4 |
Generated videos |
timings.json |
Timing measurements for each iteration |
profile_trace_rank_{rank}.json |
Chrome trace file for profiling |
Saved outputs depend on the input arguments used.