xinhe-nv · pull · Mar 9, 2026 · Mar 8, 2026 · Mar 9, 2026 · Mar 9, 2026
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
@@ -59,8 +59,10 @@
 /tensorrt_llm/_torch/pyexecutor @NVIDIA/trt-llm-torch-runtime-devs
 ## TensorRT-LLM Pytorch backend - AutoDeploy flow
 /tensorrt_llm/_torch/auto_deploy @NVIDIA/trt-llm-torch-autodeploy-devs
-/examples/auto_deploy @NVIDIA/trt-llm-torch-autodeploy-devs @NVIDIA/trt-llm-doc-owners
-/tests/unittest/_torch/auto_deploy @NVIDIA/trt-llm-torch-autodeploy-devs
+/examples/auto_deploy @NVIDIA/trt-llm-torch-autodeploy-devs
+/docs/source/features/auto_deploy @NVIDIA/trt-llm-torch-autodeploy-devs @NVIDIA/trt-llm-doc-owners
+/tests/unittest/auto_deploy @NVIDIA/trt-llm-torch-autodeploy-devs
+/tests/integration/defs/accuracy/test_llm_api_autodeploy.py @NVIDIA/trt-llm-torch-autodeploy-devs @NVIDIA/trt-llm-qa-function
 
 ## TensorRT-LLM Pytorch - Speculative Decoding
 /tensorrt_llm/_torch/speculative @NVIDIA/trt-llm-torch-spec-decoding

diff --git a/docs/source/commands/trtllm-serve/trtllm-serve.rst b/docs/source/commands/trtllm-serve/trtllm-serve.rst
@@ -215,19 +215,24 @@ model.
 Visual Generation Serving
 ~~~~~~~~~~~~~~~~~~~~~~~~~
 
-``trtllm-serve`` supports diffusion-based visual generation models (Wan2.1, Wan2.2) for image and video generation. When a diffusion model directory is provided (detected by the presence of ``model_index.json``), the server automatically launches in visual generation mode with dedicated endpoints.
+``trtllm-serve`` supports diffusion-based visual generation models (FLUX.1, FLUX.2, Wan2.1, Wan2.2) for image and video generation. When a diffusion model directory is provided (detected by the presence of ``model_index.json``), the server automatically launches in visual generation mode with dedicated endpoints.
 
 .. note::
-   This is the initial release of TensorRT-LLM VisualGen. APIs, supported models, and optimization options are actively evolving and may change in future releases.
+   VisualGen is in **prototype** stage. APIs, supported models, and optimization options are actively evolving and may change in future releases.
 
 .. code-block:: bash
 
-   trtllm-serve Wan-AI/Wan2.1-T2V-1.3B-Diffusers \
+   # Video generation (Wan)
+   trtllm-serve Wan-AI/Wan2.2-T2V-A14B-Diffusers \
+       --extra_visual_gen_options config.yml
+
+   # Image generation (FLUX)
+   trtllm-serve black-forest-labs/FLUX.2-dev \
        --extra_visual_gen_options config.yml
 
 The ``--extra_visual_gen_options`` flag accepts a YAML file that configures quantization, parallelism, and TeaCache. Available visual generation endpoints include ``/v1/images/generations``, ``/v1/videos``, ``/v1/videos/generations``, and video management APIs.
 
-For full details, see the :doc:`../../features/visual-generation` feature documentation. Example client scripts are available in the `examples/visual_gen/serve/ <https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/visual_gen/serve>`_ directory.
+For full details, see the :doc:`../../models/visual-generation.md` feature documentation. Example client scripts are available in the `examples/visual_gen/serve/ <https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/visual_gen/serve>`_ directory.
 
 Multi-node Serving with Slurm
 -----------------------------

diff --git a/docs/source/developer-guide/overview.md b/docs/source/developer-guide/overview.md
@@ -73,3 +73,7 @@ if self.previous_batch is not None:
 ```
 
 This approach effectively reduces GPU idle time and improves overall hardware occupancy. While it introduces one extra decoding step into the pipeline, the resulting throughput gain is a significant trade-off. For this reason, the Overlap Scheduler is enabled by default in TensorRT LLM.
+
+## Visual Generation
+
+For diffusion-based visual generation (image/video), TensorRT-LLM provides a separate `VisualGen` API and `DiffusionExecutor` with its own pipeline architecture. See the [Visual Generation](../models/visual-generation.md) feature documentation.
diff --git a/docs/source/features/visual-generation.md b/docs/source/features/visual-generation.md
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -34,6 +34,7 @@ Welcome to TensorRT LLM's Documentation!
    :name: Models
 
    models/supported-models.md
+   models/visual-generation.md
    models/adding-new-model.md
 
 
@@ -67,7 +68,6 @@ Welcome to TensorRT LLM's Documentation!
    features/long-sequence.md
    features/lora.md
    features/multi-modality.md
-   features/visual-generation.md
    features/overlap-scheduler.md
    features/paged-attention-ifb-scheduler.md
    features/parallel-strategy.md

diff --git a/docs/source/models/supported-models.md b/docs/source/models/supported-models.md
@@ -81,3 +81,7 @@ Note:
 - I: Image
 - V: Video
 - A: Audio
+
+# Visual Generation Models
+
+For diffusion-based image and video generation models, see the [Visual Generation](./visual-generation.md) documentation.