title
Text-to-Image

Overview

The text-to-image pipeline of the Livepeer AI network allows you to generate high-quality images from text descriptions. This pipeline is powered by the latest diffusion models in the HuggingFace text-to-image pipeline.

{/* TODO: Replace with relative url when mintlify fixed issue. */}

graph LR
    A["A cool cat on the beach"] --> B[Gateway]
    B --> C[Orchestrator]
    C --> B
    B --> D[<div style="width: 200px;"><img src="https://mintlify.s3-us-west-1.amazonaws.com/na-36/images/ai/cool-cat.png" alt="Image of cool cat"/></div>]

Loading

Models

Warm Models

The current warm model requested for the text-to-image pipeline is:

SG161222/RealVisXL_V4.0_Lightning: A streamlined version of RealVisXL_V4.0, designed for faster inference while still aiming for photorealism.

Furthermore, several Orchestrators are currently maintaining the following model in a ready state:

ByteDance/SDXL-Lightning: A high-performance diffusion model developed by ByteDance.

For faster responses with different [text-to-image](https://huggingface.co/models?pipeline_tag=text-to-image) diffusion models, ask Orchestrators to load it on their GPU via the `ai-video` channel in [Discord Server](https://discord.gg/livepeer).

On-Demand Models

The following models have been tested and verified for the text-to-image pipeline:

If a specific model you wish to use is not listed, please submit a [feature request](https://github.com/livepeer/ai-worker/issues/new?assignees=&labels=enhancement%2Cmodel&projects=&template=model_request.yml) on GitHub to get the model verified and added to the list.

{/* prettier-ignore */}

SG161222/Realistic_Vision_V6.0_B1_noVAE: Latest (experimental) release of the Realistic Vision model specialized in creating photorealistic portraits.
stabilityai/stable-diffusion-xl-base-1.0: A base model for stable diffusion by Stability AI.
runwayml/stable-diffusion-v1-5: A stable diffusion model by Runway ML.
prompthero/openjourney-v4: A model by Prompthero for open-ended journey generation.
ByteDance/SDXL-Lightning: A lightning-fast diffusion model by ByteDance.
SG161222/RealVisXL_V4.0: A diffusion model that excels in generating high-quality, photorealistic images.
SG161222/RealVisXL_V4.0_Lightning: A streamlined version of RealVisXL_V4.0, designed for faster inference while still aiming for photorealism.
stabilityai/sd-turbo: A high-performance diffusion model by Stability AI (limited-commercial use license).
stabilityai/sdxl-turbo: An extended version of sd-turbo with enhanced capabilities (limited-commercial use license).
stabilityai/stable-diffusion-3-medium-diffusers: A Multimodal Diffusion Transformer (MMDiT) model with superior image quality, advanced typography, and enhanced prompt comprehension (limited-commercial use license).

Basic Usage Instructions

For a detailed understanding of the `text-to-image` endpoint and to experiment with the API, see the [Livepeer AI API Reference](/ai/api-reference/text-to-image). For examples of effective prompts, visit [PromptHero](https://prompthero.com/).

To generate an image with the text-to-image pipeline, send a POST request to the Gateway's text-to-image API endpoint:

curl -X POST "https://<GATEWAY_IP>/text-to-image" \
    -H "Content-Type: application/json" \
    -d '{
        "model_id":"ByteDance/SDXL-Lightning",
        "prompt":"A cool cat on the beach",
        "width": 1024,
        "height": 1024
    }'

In this command:

<GATEWAY_IP> should be replaced with your AI Gateway's IP address.
model_id is the diffusion model for image generation.
prompt is the text description for the image.

For additional optional parameters, refer to the Livepeer AI API Reference.

After execution, the Orchestrator processes the request and returns the response to the Gateway:

{
  "images": [
    {
      "nsfw": false,
      "seed": 2562822894,
      "url": "https://<GATEWAY_IP>/stream/d0fc1fc6/8fdf5a94.png"
    }
  ]
}

The url in the response is the URL of the generated image. Download the image with:

curl -O "https://<GATEWAY_IP>/stream/d0fc1fc6/8fdf5a94.png"

Applying LoRa Models

To apply LoRa filters to an image, include the loras field in your request:

curl -X POST "https://<GATEWAY_IP>/text-to-image" \
    -H "Content-Type: application/json" \
    -d '{
        "model_id":"stabilityai/stable-diffusion-xl-base-1.0",
        "prompt":"A cool cat on the beach",
        "width": 1024,
        "height": 1024,
        "loras": "{ \"latent-consistency/lcm-lora-sdxl\": 1.0, \"nerijs/pixel-art-xl\": 1.2}"
    }'

You can find a list of available LoRa models for various models on lora-studio.

Orchestrator Configuration

To configure your Orchestrator to serve the text-to-image pipeline, refer to the Orchestrator Configuration guide.

System Requirements

The following system requirements are recommended for optimal performance:

NVIDIA GPU with at least 24GB of VRAM.

Recommended Pipeline Pricing

We are planning to simplify the pricing in the future so orchestrators can set one AI price per compute unit and have the system automatically scale based on the model's compute requirements.

The pricing for the text-to-image pipeline is based on competitor pricing. However, we strongly encourage orchestrators to set their own pricing based on their costs and requirements. Setting a competitive price will help attract more jobs, as Gateways can set their maximum price for a job. The current recommended pricing for this pipeline is 1.9073484e-08 USD per output pixel (height * width * output images).

API Reference

<Card title="API Reference" icon="rectangle-terminal" href="/ai/api-reference/text-to-image"

Explore the text-to-image endpoint and experiment with the API in the Livepeer AI API Reference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-to-image.mdx

text-to-image.mdx

Overview

Models

Warm Models

On-Demand Models

Basic Usage Instructions

Applying LoRa Models

Orchestrator Configuration

System Requirements

Recommended Pipeline Pricing

API Reference

Files

text-to-image.mdx

Latest commit

History

text-to-image.mdx

File metadata and controls

Overview

Models

Warm Models

On-Demand Models

Basic Usage Instructions

Applying LoRa Models

Orchestrator Configuration

System Requirements

Recommended Pipeline Pricing

API Reference