title |
---|
Text-to-Image |
The text-to-image
pipeline of the Livepeer AI network allows you to generate
high-quality images from text descriptions. This pipeline is powered by the
latest diffusion models in the HuggingFace
text-to-image
pipeline.
{/* TODO: Replace with relative url when mintlify fixed issue. */}
graph LR
A["A cool cat on the beach"] --> B[Gateway]
B --> C[Orchestrator]
C --> B
B --> D[<div style="width: 200px;"><img src="https://mintlify.s3-us-west-1.amazonaws.com/na-36/images/ai/cool-cat.png" alt="Image of cool cat"/></div>]
The current warm model requested for the text-to-image
pipeline is:
- SG161222/RealVisXL_V4.0_Lightning: A streamlined version of RealVisXL_V4.0, designed for faster inference while still aiming for photorealism.
Furthermore, several Orchestrators are currently maintaining the following model in a ready state:
- ByteDance/SDXL-Lightning: A high-performance diffusion model developed by ByteDance.
The following models have been tested and verified for the text-to-image
pipeline:
{/* prettier-ignore */}
- SG161222/Realistic_Vision_V6.0_B1_noVAE: Latest (experimental) release of the Realistic Vision model specialized in creating photorealistic portraits.
- stabilityai/stable-diffusion-xl-base-1.0: A base model for stable diffusion by Stability AI.
- runwayml/stable-diffusion-v1-5: A stable diffusion model by Runway ML.
- prompthero/openjourney-v4: A model by Prompthero for open-ended journey generation.
- ByteDance/SDXL-Lightning: A lightning-fast diffusion model by ByteDance.
- SG161222/RealVisXL_V4.0: A diffusion model that excels in generating high-quality, photorealistic images.
- SG161222/RealVisXL_V4.0_Lightning: A streamlined version of RealVisXL_V4.0, designed for faster inference while still aiming for photorealism.
- stabilityai/sd-turbo: A high-performance diffusion model by Stability AI (limited-commercial use license).
- stabilityai/sdxl-turbo: An extended version of sd-turbo with enhanced capabilities (limited-commercial use license).
- stabilityai/stable-diffusion-3-medium-diffusers: A Multimodal Diffusion Transformer (MMDiT) model with superior image quality, advanced typography, and enhanced prompt comprehension (limited-commercial use license).
To generate an image with the text-to-image
pipeline, send a POST
request to
the Gateway's text-to-image
API endpoint:
curl -X POST "https://<GATEWAY_IP>/text-to-image" \
-H "Content-Type: application/json" \
-d '{
"model_id":"ByteDance/SDXL-Lightning",
"prompt":"A cool cat on the beach",
"width": 1024,
"height": 1024
}'
In this command:
<GATEWAY_IP>
should be replaced with your AI Gateway's IP address.model_id
is the diffusion model for image generation.prompt
is the text description for the image.
For additional optional parameters, refer to the Livepeer AI API Reference.
After execution, the Orchestrator processes the request and returns the response to the Gateway:
{
"images": [
{
"nsfw": false,
"seed": 2562822894,
"url": "https://<GATEWAY_IP>/stream/d0fc1fc6/8fdf5a94.png"
}
]
}
The url
in the response is the URL of the generated image. Download the image
with:
curl -O "https://<GATEWAY_IP>/stream/d0fc1fc6/8fdf5a94.png"
To apply LoRa filters to an image, include the loras
field in your request:
curl -X POST "https://<GATEWAY_IP>/text-to-image" \
-H "Content-Type: application/json" \
-d '{
"model_id":"stabilityai/stable-diffusion-xl-base-1.0",
"prompt":"A cool cat on the beach",
"width": 1024,
"height": 1024,
"loras": "{ \"latent-consistency/lcm-lora-sdxl\": 1.0, \"nerijs/pixel-art-xl\": 1.2}"
}'
You can find a list of available LoRa models for various models on lora-studio.
To configure your Orchestrator to serve the text-to-image
pipeline, refer to
the Orchestrator Configuration guide.
The following system requirements are recommended for optimal performance:
- NVIDIA GPU with at least 24GB of VRAM.
The pricing for the text-to-image
pipeline is based on competitor pricing.
However, we strongly encourage orchestrators to set their own pricing based on
their costs and requirements. Setting a competitive price will help attract more
jobs, as Gateways can set their maximum price for a job. The current recommended
pricing for this pipeline is 1.9073484e-08 USD
per output pixel
(height * width * output images
).
<Card title="API Reference" icon="rectangle-terminal" href="/ai/api-reference/text-to-image"
Explore the text-to-image
endpoint and experiment with the API in the
Livepeer AI API Reference.