Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/commands/trtllm-serve/trtllm-serve.rst
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@ Visual Generation Serving
``trtllm-serve`` supports diffusion-based visual generation models (FLUX.1, FLUX.2, Wan2.1, Wan2.2) for image and video generation. When a diffusion model directory is provided (detected by the presence of ``model_index.json``), the server automatically launches in visual generation mode with dedicated endpoints.

.. note::
VisualGen is in **prototype** stage. APIs, supported models, and optimization options are actively evolving and may change in future releases.
VisualGen is in **beta** stage. APIs, supported models, and optimization options are actively evolving and may change in future releases.

.. code-block:: bash

Expand Down
29 changes: 28 additions & 1 deletion docs/source/models/supported-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,4 +87,31 @@ Note:

# Visual Generation Models

For diffusion-based image and video generation models, see the [Visual Generation](./visual-generation.md) documentation.
TensorRT-LLM provides beta support for diffusion-based image and video generation.
For full documentation, see the [Visual Generation](./visual-generation.md) page.

## Supported Models

| HuggingFace Model ID | Tasks |
|---|---|
| `black-forest-labs/FLUX.1-dev` | Text-to-Image |
| `black-forest-labs/FLUX.2-dev` | Text-to-Image |
| `Wan-AI/Wan2.1-T2V-1.3B-Diffusers` | Text-to-Video |
| `Wan-AI/Wan2.1-T2V-14B-Diffusers` | Text-to-Video |
| `Wan-AI/Wan2.1-I2V-14B-480P-Diffusers` | Image-to-Video |
| `Wan-AI/Wan2.1-I2V-14B-720P-Diffusers` | Image-to-Video |
| `Wan-AI/Wan2.2-T2V-A14B-Diffusers` | Text-to-Video |
| `Wan-AI/Wan2.2-I2V-A14B-Diffusers` | Image-to-Video |
| `Lightricks/LTX-2` | Text-to-Video (with Audio), Image-to-Video (with Audio) |

## Feature Matrix

| Model | TeaCache | CFG Parallelism | Ulysses Parallelism | Parallel VAE | CUDA Graph | torch.compile | trtllm-serve |
|---|---|---|---|---|---|---|---|
| **FLUX.1** | Yes | No [^vg1] | Yes | No | Yes | Yes | Yes |
| **FLUX.2** | Yes | No [^vg1] | Yes | No | Yes | Yes | Yes |
| **Wan 2.1** | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| **Wan 2.2** | No | Yes | Yes | Yes | Yes | Yes | Yes |
| **LTX-2** | No | Yes | Yes | No | No | Yes | Yes |

[^vg1]: FLUX models use embedded guidance and do not have a separate negative prompt path, so CFG parallelism is not applicable.
11 changes: 5 additions & 6 deletions docs/source/models/visual-generation.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Visual Generation (Prototype)
# Visual Generation (Beta)

```{note}
This feature is in **prototype** stage. APIs, supported models, and optimization options are
This feature is in **beta** stage. APIs, supported models, and optimization options are
actively evolving and may change in future releases.
```

Expand Down Expand Up @@ -30,7 +30,7 @@ TensorRT-LLM **VisualGen** provides a unified inference stack for diffusion mode
| `Wan-AI/Wan2.1-I2V-14B-720P-Diffusers` | Image-to-Video |
| `Wan-AI/Wan2.2-T2V-A14B-Diffusers` | Text-to-Video |
| `Wan-AI/Wan2.2-I2V-A14B-Diffusers` | Image-to-Video |
| `Lightricks/LTX-Video` | Text-to-Video (with Audio), Image-to-Video (with Audio) |
| `Lightricks/LTX-2` | Text-to-Video (with Audio), Image-to-Video (with Audio) |

Models are auto-detected from the checkpoint directory. Diffusers-format models are detected via `model_index.json`; LTX-2 monolithic safetensors checkpoints are detected via embedded metadata. The `AutoPipeline` registry selects the appropriate pipeline class automatically.

Expand All @@ -50,9 +50,8 @@ Models are auto-detected from the checkpoint directory. Diffusers-format models

Here is a simple example to generate a video with Wan 2.1:

```{literalinclude} ../../../examples/visual_gen/quickstart_example.py
:language: python
:linenos:
```bash
python examples/visual_gen/quickstart_example.py
```

To learn more about VisualGen, see [`examples/visual_gen/`](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/visual_gen) for more examples including text-to-image, image-to-video, and batch generation.
Expand Down
22 changes: 11 additions & 11 deletions security_scanning/docs/poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading