Multimodal AI Visual Generator with OpenVINO™ Toolkit

Transform a single creative prompt into a vivid illustrated story or artistic T-shirt design using optimized LLMs and text-to-image models.

🏠 About the Kits ·

The Multimodal AI Visual Generator is a multimodal generative AI reference kit that demonstrates how large language models (LLMs) and diffusion-based image generation models can work together in a creative pipeline. It allows users to transform a single text prompt into detailed illustrated stories or stylized T-shirt design concepts, using optimized models for local deployment.

By combining LLM-driven prompt generation with image synthesis, the application shows how OpenVINO™ can accelerate multimodal generative AI workflows across Intel® NPUs, CPUs, integrated GPUs, and discrete GPUs. Multimodal AI Visual Generator delivers a complete pipeline, covering prompt input, scene generation, visual rendering, and PDF export.

This kit serves as a practical foundation for building real-world applications in storytelling, branding, education, and other creative domains powered by generative AI.

This kit uses the following technology stack:

OpenVINO Toolkit
OpenVINO™ GenAI
Optimum Intel
Qwen2-7B (LLM)
FLUX.1 (text-to-image)
Streamlit (frontend)
FastAPI (backend)

Check out our AI Reference Kits repository for other kits.

What's Included

This project includes:

LLM-based prompt generation
Text-to-image rendering
Interactive web UI built with Streamlit
Conversion scripts to optimize models using OpenVINO
PDF output generation

What's New

New updates will be added to this contents list.

Table of Contents

Getting Started
Try it Out
Additional Resources

Getting Started

Now, let's dive into the steps starting with installing Python. We recommend using Ubuntu to set up and run this project.

Star the Repository

Star the repository (optional, but recommended :))

Install Prerequisites

This project requires Python 3.10 or higher and a few libraries. If you don't have Python installed on your machine, go to https://www.python.org/downloads/ and download the latest version for your operating system. Follow the prompts to install Python, making sure to check the option to add Python to your PATH environment variable.

Python ≥ 3.10
Git and Git LFS
(Windows only) VC++ Redistributable

Install dependencies:

Install libraries and tools:

sudo apt update
sudo apt install git git-lfs python3-venv python3-dev
git lfs install

Set Up Your Environment

To set up your environment, you first clone the repository, then create a virtual environment, activate the environment, and install the packages.

Clone the Repository

To clone the repository and navigate into the directory, run the following command:

git clone https://github.com/openvinotoolkit/openvino_build_deploy.git
cd openvino_build_deploy/ai_ref_kits/multimodal_ai_visual_generator

Create a Virtual Environment

To create a virtual environment, open your terminal or command prompt and navigate to the directory where you want to create the environment. Then, run the following command to create and activate the environment:

python3 -m venv venv
source venv/bin/activate   # On Windows: venv\Scripts\activate

Install Python Dependencies

To install the required packages, run the following commands:

python -m pip install --upgrade pip 
pip install -r requirements.txt

Convert and Optimize the Model

Accessing Gated Models with Hugging Face

Set Up a Hugging Face Account: If you don't have one, create a Hugging Face account.

Authenticate gated models on Hugging Face. To authenticate, enter the same email address you used for the Hugging Face website. After authentication, you'll gain access to the model.

To use the model, authenticate using the Hugging Face CLI:

huggingface-cli login

When prompted to add the token as a git credential, respond with 'n'. This step ensures that you are logged into the Hugging Face API and ready to download the model.

Now, you're ready to download and optimize the models required to run the application.

Note: Some demonstrated models can require at least 32GB RAM for conversion and running.

Use the provided scripts to export and optimize the models. When you run them, by default, the scripts will prompt you with a numbered list of supported models to choose from interactively.

Convert the Chat LLM

python convert_and_optimize_llm.py

Convert the Image Generation Model

python convert_and_optimize_text2image.py

The script will then handle download, export, and OpenVINO optimization automatically.

Alternatively, you can also run the scripts non-interactively by directly specifying the model and precision as shown below:

python convert_and_optimize_llm.py --chat_model_type qwen2-7B --precision int4
python convert_and_optimize_text2image.py --image_model_type flux.1-schnell --precision int4

Run the Application

This app has two components: a FastAPI backend and a Streamlit frontend.

Step 1: Run FastAPI (in Terminal 1)

The FastAPI backend can be configured using environment variables to specify which models to use:

IMAGE_MODEL_TYPE: The type of image generation model to use (default: "flux.1-schnell")
LLM_MODEL_TYPE: The type of language model to use (default: "qwen2-7B")
MODEL_PRECISION: The precision to use for both models (default: "int4")

You can set these variables when running the application:

cd openvino_build_deploy/ai_ref_kits/multimodal_ai_visual_generator
source venv/bin/activate         # On Windows: venv\Scripts\activate

# Run with default values
uvicorn main:app --host 0.0.0.0 --port 8000

# Or run with custom model configuration
IMAGE_MODEL_TYPE="your-image-model" LLM_MODEL_TYPE="your-llm-model" MODEL_PRECISION="int4" uvicorn main:app --host 0.0.0.0 --port 8000

If no environment variables are set, the application will use the default values.

Step 2: Run Streamlit UI (in Terminal 2)

cd openvino_build_deploy/ai_ref_kits/multimodal_ai_visual_generator
source venv/bin/activate         # On Windows: venv\Scripts\activate
streamlit run streamlit_app.py

Once both servers are up, the browser will open to http://localhost:8501.

Try it Out

Illustration mode:

"A bunny explores a candy forest"
"A robot learns to bake cookies"

Branding mode:

"A turtle with a magic wand"
"A happy robot with a party hat"

Additional Resources

Learn more about OpenVINO
Explore OpenVINO's documentation

Back to top ⬆️