This directory contains example scripts that demonstrate how to use the TensorRT-LLM Visual Generation API endpoints for image and video generation.
These examples show how to interact with the visual generation server using both the OpenAI Python SDK and standard HTTP requests. The API provides endpoints for:
- Image Generation: Text-to-image generation (T2I)
- Video Generation:
- Text-to-video generation (T2V) - generate videos from text prompts only
- Text+Image-to-video generation (TI2V) - generate videos from text + reference image
- Both synchronous and asynchronous modes supported
- Multipart/form-data support for file uploads
- Video Management: Retrieving and deleting generated videos
Before running these examples, ensure you have:
-
Install modules: Install required dependencies before running examples:
pip install git+https://github.com/huggingface/diffusers.git
Optional: For better video compression (H.264/MP4), install ffmpeg:
# Ubuntu/Debian apt-get install ffmpegIf ffmpeg is not available, the server will use a pure Python encoder that outputs MJPEG/AVI format. See FFmpeg download page for installation instructions on other platforms.
-
Server Running: The TensorRT-LLM visual generation server must be running
trtllm-serve <path to your model> --extra_visual_gen_options <path to config yaml>
e.g.
trtllm-serve $LLM_MODEL_DIR/Wan2.1-T2V-1.3B-Diffusers --extra_visual_gen_options ./configs/wan21.yml trtllm-serve $LLM_MODEL_DIR/Wan2.2-T2V-A14B-Diffusers --extra_visual_gen_options ./configs/wan22.yml trtllm-serve $LLM_MODEL_DIR/FLUX.1-dev --extra_visual_gen_options ./configs/flux1.yml trtllm-serve $LLM_MODEL_DIR/FLUX.2-dev --extra_visual_gen_options ./configs/flux2.yml trtllm-serve $LLM_MODEL_DIR/LTX-2/ --extra_visual_gen_options ./configs/ltx2.yml # Run server on background: trtllm-serve $LLM_MODEL_DIR/Wan2.1-T2V-1.3B-Diffusers --extra_visual_gen_options ./configs/wan21.yml > /tmp/serve.log 2>&1 & ## Check if the server is setup tail -f /tmp/serve.log
For LTX-2, you need to provide a proper text_encoder_path in
./configs/ltx2.yml.
Current supported & tested models:
- WAN T2V/I2V for video generation (t2v, ti2v, delete_video)
- FLUX.1 for image generation (t2i)
- FLUX.2 for image generation (t2i)
- LTX-2 for video generation with audio (t2v, ti2v)
Demonstrates synchronous text-to-image generation using the OpenAI SDK. Supports FLUX.1 and FLUX.2.
Features:
- Generates images from text prompts
- Supports configurable model, image size, and quality
- Returns base64-encoded images or URLs
- Saves generated images to disk
Usage:
# FLUX.2 (default)
python sync_image_gen.py
# FLUX.1
python sync_image_gen.py --model flux1
# Custom server and prompt
python sync_image_gen.py --base-url http://your-server:8000/v1 --prompt "A sunset"API Endpoint: POST /v1/images/generations
Output: Saves generated image to output_generation.png (or numbered files for multiple images)
Demonstrates synchronous video generation using direct HTTP requests. Waits for completion and returns the video file directly.
Features:
- T2V Mode: Generate videos from text prompts only
- TI2V Mode: Generate videos from text + reference image (multipart/form-data)
- Waits for video generation to complete before returning
- Returns video file directly in response
- Command-line interface for easy testing
Usage:
# Text-to-Video (T2V) - No reference image
python sync_video_gen.py --mode t2v \
--prompt "A cute cat playing with a ball in the park" \
--duration 4.0 --fps 24 --size 256x256
# Text+Image-to-Video (TI2V) - With reference image
## Note: longer duration and higher size will lead to much longer waiting time
python sync_video_gen.py --mode ti2v \
--prompt "She turns around and smiles, then slowly walks out of the frame" \
--image ./media/woman_skyline_original_720p.jpeg \
--duration 4.0 --fps 24 --size 512x512
# Custom parameters
python sync_video_gen.py --mode t2v \
--prompt "A serene sunset over the ocean" \
--duration 5.0 --fps 30 --size 512x512 \
--output my_video.mp4
# LTX-2: Text-to-Video (generates video with audio)
python sync_video_gen.py --mode t2v \
--model ltx2 \
--prompt "A cute cat playing with a ball in the park" \
--duration 5.0 --fps 24 --size 1280x720
# LTX-2: Image-to-Video
python sync_video_gen.py --mode ti2v \
--model ltx2 \
--prompt "She turns around and smiles, then slowly walks out of the frame" \
--image ./media/woman_skyline_original_720p.jpeg \
--duration 5.0 --fps 24 --size 1280x720Command-Line Arguments:
--mode- Generation mode:t2vorti2v(default: t2v)--prompt- Text prompt for video generation (required)--image- Path to reference image (required for ti2v mode)--base-url- API server URL (default: http://localhost:8000/v1)--model- Model name (default: wan). Useltx2for LTX-2.--duration- Video duration in seconds (default: 4.0)--fps- Frames per second (default: 24)--size- Video resolution in WxH format (default: 256x256)--output- Output video file path (default: output_sync.mp4)
API Endpoint: POST /v1/videos/generations
API Details:
- T2V uses JSON
Content-Type: application/json - TI2V uses multipart/form-data
Content-Type: multipart/form-datawith file upload
Output: Saves generated video to specified output file
NEW: Enhanced async video generation supporting both Text-to-Video (T2V) and Text+Image-to-Video (TI2V) modes.
Features:
- T2V Mode: Generate videos from text prompts only (JSON request)
- TI2V Mode: Generate videos from text + reference image (multipart/form-data with file upload)
- Command-line interface for easy testing
- Automatic mode detection
- Comprehensive parameter control
Usage:
# Text-to-Video (T2V) - No reference image
python async_video_gen.py --mode t2v \
--prompt "A cool cat on a motorcycle in the night" \
--duration 4.0 --fps 24 --size 256x256
# Text+Image-to-Video (TI2V) - With reference image
python async_video_gen.py --mode ti2v \
--prompt "She turns around and smiles, then slowly walks out of the frame" \
--image ./media/woman_skyline_original_720p.jpeg \
--duration 4.0 --fps 24 --size 512x512
# Custom parameters
python async_video_gen.py --mode t2v \
--prompt "A serene sunset over the ocean" \
--duration 5.0 --fps 30 --size 512x512 \
--output my_video.mp4
# LTX-2: Async Text-to-Video (generates video with audio)
python async_video_gen.py --mode t2v \
--model ltx2 \
--prompt "A cool cat on a motorcycle in the night" \
--duration 5.0 --fps 24 --size 1280x720
# LTX-2: Async Image-to-Video
python async_video_gen.py --mode ti2v \
--model ltx2 \
--prompt "She turns around and smiles, then slowly walks out of the frame" \
--image ./media/woman_skyline_original_720p.jpeg \
--duration 5.0 --fps 24 --size 1280x720Command-Line Arguments:
--mode- Generation mode:t2vorti2v(default: t2v)--prompt- Text prompt for video generation (required)--image- Path to reference image (required for ti2v mode)--base-url- API server URL (default: http://localhost:8000/v1)--model- Model name (default: wan). Useltx2for LTX-2.--duration- Video duration in seconds (default: 4.0)--fps- Frames per second (default: 24)--size- Video resolution in WxH format (default: 256x256)--output- Output video file path (default: output_async.mp4)
API Details:
- T2V uses JSON
Content-Type: application/json - TI2V uses multipart/form-data
Content-Type: multipart/form-datawith file upload
Output: Saves generated video to specified output file
Demonstrates the complete lifecycle of video generation and deletion.
Features:
- Creates a test video generation job
- Waits for completion
- Deletes the generated video
- Verifies deletion by attempting to retrieve the deleted video
- Tests error handling for non-existent videos
Usage:
# Use default localhost server
python delete_video.py
# Specify custom server URL
python delete_video.py http://your-server:8000/v1API Endpoints:
POST /v1/videos- Create video jobGET /v1/videos/{video_id}- Check video statusDELETE /v1/videos/{video_id}- Delete video
Test Flow:
- Create video generation job
- Wait for completion
- Delete the video
- Verify video returns
NotFoundError - Test deletion of non-existent video
All examples use the following default configuration:
- Base URL:
http://localhost:8000/v1 - API Key:
"tensorrt_llm"(authentication token) - Timeout: 300 seconds for async operations
You can customize these by:
- Passing the base URL as a command-line argument
- Modifying the default parameters in each script's function
model: Model identifier (e.g., "flux1", "flux2")prompt: Text descriptionn: Number of images to generatesize: Image dimensions (e.g., "512x512", "1024x1024")quality: "standard" or "hd"response_format: "b64_json" or "url"
model: Model identifier (e.g., "wan", "ltx2")prompt: Text descriptionsize: Video resolution (e.g., "256x256", "512x512", "1280x720")seconds: Duration in secondsfps: Frames per secondinput_reference: Reference image file (for TI2V mode)
Note: LTX-2 generates video with audio. The
ltx2.ymlconfig must includetext_encoder_pathpointing to a Gemma3 model (e.g.,google/gemma-3-12b-it).
curl -X POST "http://localhost:8000/v1/videos" \
-H "Content-Type: application/json" \
-d '{
"prompt": "A cool cat on a motorcycle",
"seconds": 4.0,
"fps": 24,
"size": "256x256"
}'curl -X POST "http://localhost:8000/v1/videos" \
-H "Content-Type: application/json" \
-d '{
"model": "ltx2",
"prompt": "A cool cat on a motorcycle",
"seconds": 5.0,
"fps": 24,
"size": "1280x720"
}'curl -X POST "http://localhost:8000/v1/videos" \
-F "prompt=She turns around and smiles" \
-F "input_reference=@./media/woman_skyline_original_720p.jpeg" \
-F "seconds=4.0" \
-F "fps=24" \
-F "size=256x256" \
-F "guidance_scale=5.0"curl -X GET "http://localhost:8000/v1/videos/{video_id}"# The server returns either MP4 (with ffmpeg) or AVI (without ffmpeg)
# Check the Content-Type header to determine the format
curl -X GET "http://localhost:8000/v1/videos/{video_id}/content" -o output.mp4
# Or use -J -O to let curl use the server-provided filename
curl -X GET "http://localhost:8000/v1/videos/{video_id}/content" -J -Ocurl -X DELETE "http://localhost:8000/v1/videos/{video_id}"| Endpoint | Method | Mode | Content-Type | Purpose |
|---|---|---|---|---|
/v1/videos |
POST | Async | JSON or Multipart | Create video job (T2V/TI2V) |
/v1/videos/generations |
POST | Sync | JSON or Multipart | Generate video sync (T2V/TI2V) |
/v1/videos/{id} |
GET | - | - | Get video status/metadata |
/v1/videos/{id}/content |
GET | - | - | Download video file |
/v1/videos/{id} |
DELETE | - | - | Delete video |
/v1/videos |
GET | - | - | List all videos |
/v1/images/generations |
POST | - | JSON | Generate images (T2I) |
Note: Both /v1/videos (async) and /v1/videos/generations (sync) support:
- JSON: Standard text-to-video (T2V)
- Multipart/Form-Data: Text+image-to-video (TI2V) with file upload
All examples include comprehensive error handling:
- Connection errors (server not running)
- API errors (invalid parameters, model not found)
- Timeout errors (generation taking too long)
- Resource errors (video not found for deletion)
Errors are displayed with full stack traces for debugging.
Generated files are saved to the current working directory:
output_generation.png- Synchronous image generation (sync_image_gen.py)output_sync.mp4oroutput_sync.avi- Synchronous video generation (sync_video_gen.py)output_async.mp4oroutput_async.avi- Asynchronous video generation (async_video_gen.py)
Note: You can customize output filenames using the --output parameter in all scripts.
The server supports two video encoding modes:
| Encoder | Format | Requirements | Features |
|---|---|---|---|
| FFmpeg (H.264) | MP4 | ffmpeg installed | Better compression, audio support |
| Pure Python (MJPEG) | AVI | None (built-in) | No external dependencies |
The server automatically selects the best available encoder. The example scripts detect the actual format from the server response and adjust the output filename extension accordingly.