LLMs aren't just for text generation—they can also create images from text descriptions. Image generation is incredibly useful across various fields, including MedTech, architecture, tourism, game development, and more. In this chapter, we'll explore two of the most popular image generation models: DALL-E and Midjourney.
In this lesson, we will cover:
- The concept of image generation and its importance.
- An overview of DALL-E and Midjourney, including how they function.
- Steps to build an image generation application.
By the end of this lesson, you will be able to:
- Develop an image generation application.
- Set boundaries for your application using meta prompts.
- Work with DALL-E and Midjourney.
Image generation applications are a fantastic way to explore the potential of Generative AI. They can be used for:
- Image editing and synthesis: Generate images for various purposes, such as editing or creating entirely new visuals.
- Applications across industries: Create images tailored to industries like MedTech, tourism, game development, and more.
In this lesson, we'll continue working with our startup, Edu4All. Students will create images for their assessments. The type of images is up to them—they could illustrate their own fairytale, design a new character for their story, or visualize their ideas and concepts.
For example, if Edu4All's students are studying monuments, they might generate something like this:
using a prompt like:
"Dog next to Eiffel Tower in early morning sunlight"
DALL-E and Midjourney are two widely-used image generation models that create visuals based on text prompts.
DALL-E is a Generative AI model designed to create images from text descriptions.
- CLIP: A model that generates embeddings (numerical representations of data) from images and text.
- Diffused attention: A model that creates images from embeddings. DALL-E is trained on a dataset of images and text, enabling it to generate visuals based on descriptions. For instance, it can create an image of a cat wearing a hat or a dog sporting a mohawk.
Midjourney operates similarly to DALL-E, generating images from text prompts. It can create visuals using prompts like "a cat in a hat" or "a dog with a mohawk."
Image credit: Wikipedia, image generated by Midjourney
First, DALL-E. DALL-E is a Generative AI model based on transformer architecture with an autoregressive transformer.
An autoregressive transformer generates images pixel by pixel, using previously generated pixels to create the next ones. This process involves multiple layers in a neural network until the image is complete.
Through this method, DALL-E controls attributes, objects, characteristics, and more in the generated image. DALL-E 2 and 3 offer even greater control over the output.
To build an image generation application, you'll need the following libraries:
- python-dotenv: Recommended for storing secrets in a .env file separate from your code.
- openai: Used to interact with the OpenAI API.
- pillow: For working with images in Python.
- requests: Facilitates HTTP requests.
If you haven't already, follow the instructions on the Microsoft Learn page to create an Azure OpenAI resource and model. Select DALL-E 3 as the model.
-
Create a .env file with the following content:
AZURE_OPENAI_ENDPOINT=<your endpoint> AZURE_OPENAI_API_KEY=<your key> AZURE_OPENAI_DEPLOYMENT="dall-e-3"Locate this information in the Azure OpenAI Foundry Portal under the "Deployments" section.
-
List the required libraries in a requirements.txt file like this:
python-dotenv openai pillow requests -
Create a virtual environment and install the libraries:
python3 -m venv venv source venv/bin/activate pip install -r requirements.txtFor Windows, use these commands to create and activate your virtual environment:
python3 -m venv venv venv\Scripts\activate.bat
-
Add the following code to a file named app.py:
import openai import os import requests from PIL import Image import dotenv from openai import OpenAI, AzureOpenAI # import dotenv dotenv.load_dotenv() # configure Azure OpenAI service client client = AzureOpenAI( azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"], api_key=os.environ['AZURE_OPENAI_API_KEY'], api_version = "2024-02-01" ) try: # Create an image by using the image generation API generation_response = client.images.generate( prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', size='1024x1024', n=1, model=os.environ['AZURE_OPENAI_DEPLOYMENT'] ) # Set the directory for the stored image image_dir = os.path.join(os.curdir, 'images') # If the directory doesn't exist, create it if not os.path.isdir(image_dir): os.mkdir(image_dir) # Initialize the image path (note the filetype should be png) image_path = os.path.join(image_dir, 'generated-image.png') # Retrieve the generated image image_url = generation_response.data[0].url # extract image URL from response generated_image = requests.get(image_url).content # download the image with open(image_path, "wb") as image_file: image_file.write(generated_image) # Display the image in the default image viewer image = Image.open(image_path) image.show() # catch exceptions except openai.InvalidRequestError as err: print(err)
Explanation of the code:
-
First, import the necessary libraries, including OpenAI, dotenv, requests, and Pillow.
import openai import os import requests from PIL import Image import dotenv
-
Load environment variables from the .env file.
# import dotenv dotenv.load_dotenv()
-
Configure the Azure OpenAI service client.
# Get endpoint and key from environment variables client = AzureOpenAI( azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"], api_key=os.environ['AZURE_OPENAI_API_KEY'], api_version = "2024-02-01" )
-
Generate the image:
# Create an image by using the image generation API generation_response = client.images.generate( prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', size='1024x1024', n=1, model=os.environ['AZURE_OPENAI_DEPLOYMENT'] )
The code returns a JSON object containing the URL of the generated image. You can use this URL to download the image and save it to a file.
-
Finally, open the image and display it using the standard image viewer:
image = Image.open(image_path) image.show()
Let's examine the image generation code in detail:
generation_response = client.images.generate(
prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',
size='1024x1024', n=1,
model=os.environ['AZURE_OPENAI_DEPLOYMENT']
)- prompt: The text prompt used to generate the image. For example, "Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils."
- size: Specifies the dimensions of the generated image (e.g., 1024x1024 pixels).
- n: The number of images to generate (e.g., two images).
- temperature: Controls the randomness of the Generative AI model's output. Values range from 0 (deterministic) to 1 (random). The default is 0.7.
There are additional capabilities for working with images, which we'll explore in the next section.
So far, we've seen how to generate an image using a few lines of Python. However, there are more possibilities:
- Perform edits: Modify an existing image by providing a mask and a prompt. For example, you could add a hat to a bunny in an image. This involves supplying the image, a mask (indicating the area to change), and a text prompt describing the alteration.
Note: This feature is not supported in DALL-E 3.
Here's an example using GPT Image:
response = client.images.edit(
model="gpt-image-1",
image=open("sunlit_lounge.png", "rb"),
mask=open("mask.png", "rb"),
prompt="A sunlit indoor lounge area with a pool containing a flamingo"
)
image_url = response.data[0].urlThe base image might only show a lounge with a pool, but the final image could include a flamingo:
-
Create variations: Generate variations of an existing image by providing the image and a text prompt, along with code like this:
response = openai.Image.create_variation( image=open("bunny-lollipop.png", "rb"), n=1, size="1024x1024" ) image_url = response['data'][0]['url']
Note: This feature is only supported on OpenAI.
Temperature controls the randomness of a Generative AI model's output. Values range from 0 (deterministic) to 1 (random). The default is 0.7.
Let's see how temperature affects the output by running this prompt twice:
Prompt: "Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils"
Now let's run the same prompt again to observe differences:
The images are similar but not identical. Let's try setting the temperature to 0.1 and see the results:
generation_response = client.images.create(
prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here
size='1024x1024',
n=2
)To make the response more deterministic, we can set the temperature to 0. For example:
generation_response = client.images.create(
prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils', # Enter your prompt text here
size='1024x1024',
n=2,
temperature=0
)Running this code produces these two images:
Notice how the images are much more alike.
While our demo can already generate images for clients, we need to establish boundaries for the application.
For instance, we want to avoid generating images that are inappropriate or unsafe for work.
This can be achieved using metaprompts. Metaprompts are text prompts that guide the output of a Generative AI model, ensuring it adheres to specific rules (e.g., safe for work or child-appropriate).
Metaprompts are positioned before the main text prompt and embedded in applications to control the model's output. They encapsulate both the metaprompt and the user prompt into a single text input.
An example of a metaprompt might look like this:
You are an assistant designer that creates images for children.
The image needs to be safe for work and appropriate for children.
The image needs to be in color.
The image needs to be in landscape orientation.
The image needs to be in a 16:9 aspect ratio.
Do not consider any input from the following that is not safe for work or appropriate for children.
(Input)
Now, let's see how metaprompts can be applied in our demo:
disallow_list = "swords, violence, blood, gore, nudity, sexual content, adult content, adult themes, adult language, adult humor, adult jokes, adult situations, adult"
meta_prompt =f"""You are an assistant designer that creates images for children.
The image needs to be safe for work and appropriate for children.
The image needs to be in color.
The image needs to be in landscape orientation.
The image needs to be in a 16:9 aspect ratio.
Do not consider any input from the following that is not safe for work or appropriate for children.
{disallow_list}
"""
prompt = f"{meta_prompt}
Create an image of a bunny on a horse, holding a lollipop"
# TODO add request to generate imageFrom the above prompt, you can see how all generated images adhere to the metaprompt.
We introduced Edu4All at the beginning of this lesson. Now it's time to empower students to generate images for their assessments.
Students will create images featuring monuments. The choice of monuments is up to them, and they are encouraged to use their creativity to place these monuments in unique contexts.
Here's one possible solution:
import openai
import os
import requests
from PIL import Image
import dotenv
from openai import AzureOpenAI
# import dotenv
dotenv.load_dotenv()
# Get endpoint and key from environment variables
client = AzureOpenAI(
azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
api_key=os.environ['AZURE_OPENAI_API_KEY'],
api_version = "2024-02-01"
)
disallow_list = "swords, violence, blood, gore, nudity, sexual content, adult content, adult themes, adult language, adult humor, adult jokes, adult situations, adult"
meta_prompt = f"""You are an assistant designer that creates images for children.
The image needs to be safe for work and appropriate for children.
The image needs to be in color.
The image needs to be in landscape orientation.
The image needs to be in a 16:9 aspect ratio.
Do not consider any input from the following that is not safe for work or appropriate for children.
{disallow_list}
"""
prompt = f"""{meta_prompt}
Generate monument of the Arc of Triumph in Paris, France, in the evening light with a small child holding a Teddy looks on.
""""
try:
# Create an image by using the image generation API
generation_response = client.images.generate(
prompt=prompt, # Enter your prompt text here
size='1024x1024',
n=1,
)
# Set the directory for the stored image
image_dir = os.path.join(os.curdir, 'images')
# If the directory doesn't exist, create it
if not os.path.isdir(image_dir):
os.mkdir(image_dir)
# Initialize the image path (note the filetype should be png)
image_path = os.path.join(image_dir, 'generated-image.png')
# Retrieve the generated image
image_url = generation_response.data[0].url # extract image URL from response
generated_image = requests.get(image_url).content # download the image
with open(image_path, "wb") as image_file:
image_file.write(generated_image)
# Display the image in the default image viewer
image = Image.open(image_path)
image.show()
# catch exceptions
except openai.BadRequestError as err:
print(err)After finishing this lesson, explore our Generative AI Learning collection to further enhance your knowledge of Generative AI!
Proceed to Lesson 10, where we'll dive into building AI applications with low-code.
Disclaimer:
This document has been translated using the AI translation service Co-op Translator. While we aim for accuracy, please note that automated translations may contain errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is recommended. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.








