Skip to content

Latest commit

 

History

History
479 lines (330 loc) · 18.6 KB

File metadata and controls

479 lines (330 loc) · 18.6 KB

Building Image Generation Applications

Building Image Generation Applications

LLMs aren't just for text generation—they can also create images from text descriptions. Image generation is incredibly useful across various fields, including MedTech, architecture, tourism, game development, and more. In this chapter, we'll explore two of the most popular image generation models: DALL-E and Midjourney.

Introduction

In this lesson, we will cover:

  • The concept of image generation and its importance.
  • An overview of DALL-E and Midjourney, including how they function.
  • Steps to build an image generation application.

Learning Goals

By the end of this lesson, you will be able to:

  • Develop an image generation application.
  • Set boundaries for your application using meta prompts.
  • Work with DALL-E and Midjourney.

Why build an image generation application?

Image generation applications are a fantastic way to explore the potential of Generative AI. They can be used for:

  • Image editing and synthesis: Generate images for various purposes, such as editing or creating entirely new visuals.
  • Applications across industries: Create images tailored to industries like MedTech, tourism, game development, and more.

Scenario: Edu4All

In this lesson, we'll continue working with our startup, Edu4All. Students will create images for their assessments. The type of images is up to them—they could illustrate their own fairytale, design a new character for their story, or visualize their ideas and concepts.

For example, if Edu4All's students are studying monuments, they might generate something like this:

Edu4All startup, class on monuments, Eiffel Tower

using a prompt like:

"Dog next to Eiffel Tower in early morning sunlight"

What is DALL-E and Midjourney?

DALL-E and Midjourney are two widely-used image generation models that create visuals based on text prompts.

DALL-E

DALL-E is a Generative AI model designed to create images from text descriptions.

DALL-E combines two models: CLIP and diffused attention.

  • CLIP: A model that generates embeddings (numerical representations of data) from images and text.
  • Diffused attention: A model that creates images from embeddings. DALL-E is trained on a dataset of images and text, enabling it to generate visuals based on descriptions. For instance, it can create an image of a cat wearing a hat or a dog sporting a mohawk.

Midjourney

Midjourney operates similarly to DALL-E, generating images from text prompts. It can create visuals using prompts like "a cat in a hat" or "a dog with a mohawk."

Image generated by Midjourney, mechanical pigeon Image credit: Wikipedia, image generated by Midjourney

How do DALL-E and Midjourney work?

First, DALL-E. DALL-E is a Generative AI model based on transformer architecture with an autoregressive transformer.

An autoregressive transformer generates images pixel by pixel, using previously generated pixels to create the next ones. This process involves multiple layers in a neural network until the image is complete.

Through this method, DALL-E controls attributes, objects, characteristics, and more in the generated image. DALL-E 2 and 3 offer even greater control over the output.

Building your first image generation application

To build an image generation application, you'll need the following libraries:

  • python-dotenv: Recommended for storing secrets in a .env file separate from your code.
  • openai: Used to interact with the OpenAI API.
  • pillow: For working with images in Python.
  • requests: Facilitates HTTP requests.

Create and deploy an Azure OpenAI model

If you haven't already, follow the instructions on the Microsoft Learn page to create an Azure OpenAI resource and model. Select DALL-E 3 as the model.

Create the app

  1. Create a .env file with the following content:

    AZURE_OPENAI_ENDPOINT=<your endpoint>
    AZURE_OPENAI_API_KEY=<your key>
    AZURE_OPENAI_DEPLOYMENT="dall-e-3"
    

    Locate this information in the Azure OpenAI Foundry Portal under the "Deployments" section.

  2. List the required libraries in a requirements.txt file like this:

    python-dotenv
    openai
    pillow
    requests
    
  3. Create a virtual environment and install the libraries:

    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt

    For Windows, use these commands to create and activate your virtual environment:

    python3 -m venv venv
    venv\Scripts\activate.bat
  4. Add the following code to a file named app.py:

    import openai
    import os
    import requests
    from PIL import Image
    import dotenv
    from openai import OpenAI, AzureOpenAI
    
    # import dotenv
    dotenv.load_dotenv()
    
    # configure Azure OpenAI service client 
    client = AzureOpenAI(
      azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
      api_key=os.environ['AZURE_OPENAI_API_KEY'],
      api_version = "2024-02-01"
      )
    try:
        # Create an image by using the image generation API
        generation_response = client.images.generate(
                                prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',
                                size='1024x1024', n=1,
                                model=os.environ['AZURE_OPENAI_DEPLOYMENT']
                              )
    
        # Set the directory for the stored image
        image_dir = os.path.join(os.curdir, 'images')
    
        # If the directory doesn't exist, create it
        if not os.path.isdir(image_dir):
            os.mkdir(image_dir)
    
        # Initialize the image path (note the filetype should be png)
        image_path = os.path.join(image_dir, 'generated-image.png')
    
        # Retrieve the generated image
        image_url = generation_response.data[0].url  # extract image URL from response
        generated_image = requests.get(image_url).content  # download the image
        with open(image_path, "wb") as image_file:
            image_file.write(generated_image)
    
        # Display the image in the default image viewer
        image = Image.open(image_path)
        image.show()
    
    # catch exceptions
    except openai.InvalidRequestError as err:
        print(err)

Explanation of the code:

  • First, import the necessary libraries, including OpenAI, dotenv, requests, and Pillow.

    import openai
    import os
    import requests
    from PIL import Image
    import dotenv
  • Load environment variables from the .env file.

    # import dotenv
    dotenv.load_dotenv()
  • Configure the Azure OpenAI service client.

    # Get endpoint and key from environment variables
    client = AzureOpenAI(
        azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
        api_key=os.environ['AZURE_OPENAI_API_KEY'],
        api_version = "2024-02-01"
        )
  • Generate the image:

    # Create an image by using the image generation API
    generation_response = client.images.generate(
                          prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',
                          size='1024x1024', n=1,
                          model=os.environ['AZURE_OPENAI_DEPLOYMENT']
                        )

    The code returns a JSON object containing the URL of the generated image. You can use this URL to download the image and save it to a file.

  • Finally, open the image and display it using the standard image viewer:

    image = Image.open(image_path)
    image.show()

More details on generating the image

Let's examine the image generation code in detail:

  generation_response = client.images.generate(
                            prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',
                            size='1024x1024', n=1,
                            model=os.environ['AZURE_OPENAI_DEPLOYMENT']
                        )
  • prompt: The text prompt used to generate the image. For example, "Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils."
  • size: Specifies the dimensions of the generated image (e.g., 1024x1024 pixels).
  • n: The number of images to generate (e.g., two images).
  • temperature: Controls the randomness of the Generative AI model's output. Values range from 0 (deterministic) to 1 (random). The default is 0.7.

There are additional capabilities for working with images, which we'll explore in the next section.

Additional capabilities of image generation

So far, we've seen how to generate an image using a few lines of Python. However, there are more possibilities:

  • Perform edits: Modify an existing image by providing a mask and a prompt. For example, you could add a hat to a bunny in an image. This involves supplying the image, a mask (indicating the area to change), and a text prompt describing the alteration.

Note: This feature is not supported in DALL-E 3.

Here's an example using GPT Image:

response = client.images.edit(
    model="gpt-image-1",
    image=open("sunlit_lounge.png", "rb"),
    mask=open("mask.png", "rb"),
    prompt="A sunlit indoor lounge area with a pool containing a flamingo"
)
image_url = response.data[0].url

The base image might only show a lounge with a pool, but the final image could include a flamingo:

  • Create variations: Generate variations of an existing image by providing the image and a text prompt, along with code like this:

    response = openai.Image.create_variation(
      image=open("bunny-lollipop.png", "rb"),
      n=1,
      size="1024x1024"
    )
    image_url = response['data'][0]['url']

    Note: This feature is only supported on OpenAI.

Temperature

Temperature controls the randomness of a Generative AI model's output. Values range from 0 (deterministic) to 1 (random). The default is 0.7.

Let's see how temperature affects the output by running this prompt twice:

Prompt: "Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils"

Bunny on a horse holding a lollipop, version 1

Now let's run the same prompt again to observe differences:

Generated image of bunny on horse

The images are similar but not identical. Let's try setting the temperature to 0.1 and see the results:

 generation_response = client.images.create(
        prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',    # Enter your prompt text here
        size='1024x1024',
        n=2
    )

Changing the temperature

To make the response more deterministic, we can set the temperature to 0. For example:

generation_response = client.images.create(
        prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',    # Enter your prompt text here
        size='1024x1024',
        n=2,
        temperature=0
    )

Running this code produces these two images:

  • Temperature 0, v1
  • Temperature 0, v2

Notice how the images are much more alike.

How to define boundaries for your application with metaprompts

While our demo can already generate images for clients, we need to establish boundaries for the application.

For instance, we want to avoid generating images that are inappropriate or unsafe for work.

This can be achieved using metaprompts. Metaprompts are text prompts that guide the output of a Generative AI model, ensuring it adheres to specific rules (e.g., safe for work or child-appropriate).

How does it work?

Metaprompts are positioned before the main text prompt and embedded in applications to control the model's output. They encapsulate both the metaprompt and the user prompt into a single text input.

An example of a metaprompt might look like this:

You are an assistant designer that creates images for children.

The image needs to be safe for work and appropriate for children.

The image needs to be in color.

The image needs to be in landscape orientation.

The image needs to be in a 16:9 aspect ratio.

Do not consider any input from the following that is not safe for work or appropriate for children.

(Input)

Now, let's see how metaprompts can be applied in our demo:

disallow_list = "swords, violence, blood, gore, nudity, sexual content, adult content, adult themes, adult language, adult humor, adult jokes, adult situations, adult"

meta_prompt =f"""You are an assistant designer that creates images for children.

The image needs to be safe for work and appropriate for children.

The image needs to be in color.

The image needs to be in landscape orientation.

The image needs to be in a 16:9 aspect ratio.

Do not consider any input from the following that is not safe for work or appropriate for children.
{disallow_list}
"""

prompt = f"{meta_prompt}
Create an image of a bunny on a horse, holding a lollipop"

# TODO add request to generate image

From the above prompt, you can see how all generated images adhere to the metaprompt.

Assignment - let's enable students

We introduced Edu4All at the beginning of this lesson. Now it's time to empower students to generate images for their assessments.

Students will create images featuring monuments. The choice of monuments is up to them, and they are encouraged to use their creativity to place these monuments in unique contexts.

Solution

Here's one possible solution:

import openai
import os
import requests
from PIL import Image
import dotenv
from openai import AzureOpenAI
# import dotenv
dotenv.load_dotenv()

# Get endpoint and key from environment variables
client = AzureOpenAI(
  azure_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"],
  api_key=os.environ['AZURE_OPENAI_API_KEY'],
  api_version = "2024-02-01"
  )


disallow_list = "swords, violence, blood, gore, nudity, sexual content, adult content, adult themes, adult language, adult humor, adult jokes, adult situations, adult"

meta_prompt = f"""You are an assistant designer that creates images for children.

The image needs to be safe for work and appropriate for children.

The image needs to be in color.

The image needs to be in landscape orientation.

The image needs to be in a 16:9 aspect ratio.

Do not consider any input from the following that is not safe for work or appropriate for children.
{disallow_list}
"""

prompt = f"""{meta_prompt}
Generate monument of the Arc of Triumph in Paris, France, in the evening light with a small child holding a Teddy looks on.
""""

try:
    # Create an image by using the image generation API
    generation_response = client.images.generate(
        prompt=prompt,    # Enter your prompt text here
        size='1024x1024',
        n=1,
    )
    # Set the directory for the stored image
    image_dir = os.path.join(os.curdir, 'images')

    # If the directory doesn't exist, create it
    if not os.path.isdir(image_dir):
        os.mkdir(image_dir)

    # Initialize the image path (note the filetype should be png)
    image_path = os.path.join(image_dir, 'generated-image.png')

    # Retrieve the generated image
    image_url = generation_response.data[0].url  # extract image URL from response
    generated_image = requests.get(image_url).content  # download the image
    with open(image_path, "wb") as image_file:
        image_file.write(generated_image)

    # Display the image in the default image viewer
    image = Image.open(image_path)
    image.show()

# catch exceptions
except openai.BadRequestError as err:
    print(err)

Great Work! Keep Learning

After finishing this lesson, explore our Generative AI Learning collection to further enhance your knowledge of Generative AI!

Proceed to Lesson 10, where we'll dive into building AI applications with low-code.


Disclaimer:
This document has been translated using the AI translation service Co-op Translator. While we aim for accuracy, please note that automated translations may contain errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is recommended. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.