इमेज जनरेशन एप्लिकेशन बनाना

LLM केवल टेक्स्ट जनरेशन तक सीमित नहीं हैं। टेक्स्ट विवरणों से इमेज भी बनाई जा सकती हैं। इमेज एक माध्यम के रूप में कई क्षेत्रों में बेहद उपयोगी हो सकती हैं, जैसे MedTech, वास्तुकला, पर्यटन, गेम डेवलपमेंट और भी बहुत कुछ। इस अध्याय में, हम दो सबसे लोकप्रिय इमेज जनरेशन मॉडल, DALL-E और Midjourney के बारे में जानेंगे।

परिचय

इस पाठ में, हम निम्नलिखित विषयों को कवर करेंगे:

इमेज जनरेशन क्या है और यह क्यों उपयोगी है।
DALL-E और Midjourney क्या हैं, और ये कैसे काम करते हैं।
आप एक इमेज जनरेशन ऐप कैसे बना सकते हैं।

सीखने के लक्ष्य

इस पाठ को पूरा करने के बाद, आप सक्षम होंगे:

एक इमेज जनरेशन एप्लिकेशन बनाना।
मेटा प्रॉम्प्ट्स के साथ अपने एप्लिकेशन की सीमाएं निर्धारित करना।
DALL-E और Midjourney के साथ काम करना।

इमेज जनरेशन एप्लिकेशन क्यों बनाएं?

इमेज जनरेशन एप्लिकेशन जनरेटिव AI की क्षमताओं को समझने का एक शानदार तरीका हैं। इन्हें निम्नलिखित के लिए इस्तेमाल किया जा सकता है:

इमेज एडिटिंग और सिंथेसिस। आप विभिन्न उपयोग मामलों के लिए इमेज बना सकते हैं, जैसे इमेज एडिटिंग और इमेज सिंथेसिस।
विभिन्न उद्योगों में उपयोग। इन्हें Medtech, पर्यटन, गेम डेवलपमेंट जैसे कई उद्योगों के लिए इमेज बनाने में भी इस्तेमाल किया जा सकता है।

परिदृश्य: Edu4All

इस पाठ के हिस्से के रूप में, हम अपने स्टार्टअप Edu4All के साथ काम जारी रखेंगे। छात्र अपनी असाइनमेंट के लिए इमेज बनाएंगे, जो कि पूरी तरह से छात्रों पर निर्भर है कि वे क्या बनाना चाहते हैं। वे अपनी खुद की परी कथा के लिए चित्र बना सकते हैं, अपनी कहानी के लिए नया पात्र बना सकते हैं, या अपने विचारों और अवधारणाओं को विज़ुअलाइज़ कर सकते हैं।

उदाहरण के लिए, अगर Edu4All के छात्र स्मारकों पर कक्षा में काम कर रहे हैं, तो वे निम्नलिखित इमेज बना सकते हैं:

प्रॉम्प्ट का उपयोग करते हुए

"Dog next to Eiffel Tower in early morning sunlight"

DALL-E और Midjourney क्या हैं?

DALL-E और Midjourney दो सबसे लोकप्रिय इमेज जनरेशन मॉडल हैं, जो प्रॉम्प्ट्स का उपयोग करके इमेज बनाते हैं।

DALL-E

आइए DALL-E से शुरू करते हैं, जो एक जनरेटिव AI मॉडल है जो टेक्स्ट विवरणों से इमेज बनाता है।

DALL-E दो मॉडलों, CLIP और diffused attention का संयोजन है।

CLIP, एक मॉडल है जो इमेज और टेक्स्ट से एम्बेडिंग्स (डेटा के संख्यात्मक प्रतिनिधित्व) बनाता है।
Diffused attention, एक मॉडल है जो एम्बेडिंग्स से इमेज बनाता है। DALL-E को इमेज और टेक्स्ट के डेटासेट पर प्रशिक्षित किया गया है और यह टेक्स्ट विवरणों से इमेज बना सकता है। उदाहरण के लिए, DALL-E एक टोपी पहने हुए बिल्ली या मोकहॉक वाले कुत्ते की इमेज बना सकता है।

Midjourney

Midjourney भी DALL-E की तरह काम करता है, यह टेक्स्ट प्रॉम्प्ट्स से इमेज बनाता है। Midjourney का उपयोग “a cat in a hat” या “dog with a mohawk” जैसे प्रॉम्प्ट्स से इमेज बनाने के लिए किया जा सकता है।

इमेज क्रेडिट: विकिपीडिया, Midjourney द्वारा बनाई गई इमेज

DALL-E और Midjourney कैसे काम करते हैं

सबसे पहले, DALL-E। DALL-E एक जनरेटिव AI मॉडल है जो ट्रांसफॉर्मर आर्किटेक्चर पर आधारित है, जिसमें autoregressive transformer होता है।

एक autoregressive transformer यह निर्धारित करता है कि मॉडल टेक्स्ट विवरणों से इमेज कैसे बनाता है, यह एक बार में एक पिक्सेल बनाता है, और फिर अगले पिक्सेल को बनाने के लिए पहले से बने पिक्सेल का उपयोग करता है। यह प्रक्रिया न्यूरल नेटवर्क की कई परतों से गुजरती है, जब तक कि इमेज पूरी न हो जाए।

इस प्रक्रिया के साथ, DALL-E इमेज में वस्तुएं, गुण, विशेषताएं आदि नियंत्रित करता है। हालांकि, DALL-E 2 और 3 में जनरेट की गई इमेज पर अधिक नियंत्रण होता है।

अपनी पहली इमेज जनरेशन एप्लिकेशन बनाना

तो, इमेज जनरेशन एप्लिकेशन बनाने के लिए क्या चाहिए? आपको निम्नलिखित लाइब्रेरीज़ की जरूरत होगी:

python-dotenv, यह लाइब्रेरी आपके सीक्रेट्स को कोड से अलग .env फाइल में रखने के लिए अत्यधिक अनुशंसित है।
openai, यह लाइब्रेरी OpenAI API के साथ इंटरैक्ट करने के लिए है।
pillow, Python में इमेज के साथ काम करने के लिए।
requests, HTTP रिक्वेस्ट बनाने में मदद के लिए।

एक .env फाइल बनाएं और निम्नलिखित सामग्री डालें:
```
AZURE_OPENAI_ENDPOINT=<your endpoint>
AZURE_OPENAI_API_KEY=<your key>
```
यह जानकारी Azure Portal में आपके रिसोर्स के "Keys and Endpoint" सेक्शन में मिलेगी।
ऊपर दी गई लाइब्रेरीज़ को एक फाइल requirements.txt में इकट्ठा करें:
```
python-dotenv
openai
pillow
requests
```
फिर, वर्चुअल एनवायरनमेंट बनाएं और लाइब्रेरीज़ इंस्टॉल करें:
```
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
Windows के लिए, वर्चुअल एनवायरनमेंट बनाने और सक्रिय करने के लिए निम्न कमांड्स का उपयोग करें:
```
python3 -m venv venv
venv\Scripts\activate.bat
```

app.py नामक फाइल में निम्न कोड जोड़ें:

import openai
import os
import requests
from PIL import Image
import dotenv

# import dotenv
dotenv.load_dotenv()

# Get endpoint and key from environment variables
openai.api_base = os.environ['AZURE_OPENAI_ENDPOINT']
openai.api_key = os.environ['AZURE_OPENAI_API_KEY']

# Assign the API version (DALL-E is currently supported for the 2023-06-01-preview API version only)
openai.api_version = '2023-06-01-preview'
openai.api_type = 'azure'


try:
    # Create an image by using the image generation API
    generation_response = openai.Image.create(
        prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',    # Enter your prompt text here
        size='1024x1024',
        n=2,
        temperature=0,
    )
    # Set the directory for the stored image
    image_dir = os.path.join(os.curdir, 'images')

    # If the directory doesn't exist, create it
    if not os.path.isdir(image_dir):
        os.mkdir(image_dir)

    # Initialize the image path (note the filetype should be png)
    image_path = os.path.join(image_dir, 'generated-image.png')

    # Retrieve the generated image
    image_url = generation_response["data"][0]["url"]  # extract image URL from response
    generated_image = requests.get(image_url).content  # download the image
    with open(image_path, "wb") as image_file:
        image_file.write(generated_image)

    # Display the image in the default image viewer
    image = Image.open(image_path)
    image.show()

# catch exceptions
except openai.InvalidRequestError as err:
    print(err)

आइए इस कोड को समझते हैं:

सबसे पहले, हम आवश्यक लाइब्रेरीज़ इम्पोर्ट करते हैं, जिनमें OpenAI लाइब्रेरी, dotenv लाइब्रेरी, requests लाइब्रेरी, और Pillow लाइब्रेरी शामिल हैं।
```
import openai
import os
import requests
from PIL import Image
import dotenv
```
इसके बाद, हम .env फाइल से पर्यावरण चर (environment variables) लोड करते हैं।
```
# import dotenv
dotenv.load_dotenv()
```

फिर, OpenAI API के लिए endpoint, key, version और type सेट करते हैं।

# Get endpoint and key from environment variables
openai.api_base = os.environ['AZURE_OPENAI_ENDPOINT']
openai.api_key = os.environ['AZURE_OPENAI_API_KEY']

# add version and type, Azure specific
openai.api_version = '2023-06-01-preview'
openai.api_type = 'azure'

इसके बाद, हम इमेज जनरेट करते हैं:
```
# Create an image by using the image generation API
generation_response = openai.Image.create(
    prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',    # Enter your prompt text here
    size='1024x1024',
    n=2,
    temperature=0,
)
```
ऊपर दिया गया कोड एक JSON ऑब्जेक्ट रिटर्न करता है जिसमें जनरेट की गई इमेज का URL होता है। हम इस URL का उपयोग करके इमेज डाउनलोड कर सकते हैं और फाइल में सेव कर सकते हैं।
अंत में, हम इमेज खोलते हैं और इसे डिफ़ॉल्ट इमेज व्यूअर में दिखाते हैं:
```
image = Image.open(image_path)
image.show()
```

इमेज जनरेट करने वाले कोड का विस्तार से विवरण

आइए इमेज जनरेट करने वाले कोड को विस्तार से देखें:

generation_response = openai.Image.create(
        prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',    # Enter your prompt text here
        size='1024x1024',
        n=2,
        temperature=0,
    )

prompt, वह टेक्स्ट प्रॉम्प्ट है जिसका उपयोग इमेज बनाने के लिए किया जाता है। इस उदाहरण में, प्रॉम्प्ट है "Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils"।
size, जनरेट की गई इमेज का आकार है। इस उदाहरण में, इमेज 1024x1024 पिक्सेल की है।
n, जनरेट की जाने वाली इमेज की संख्या है। इस उदाहरण में, दो इमेज बनाई जा रही हैं।
temperature, यह पैरामीटर जनरेटिव AI मॉडल के आउटपुट की यादृच्छिकता (randomness) को नियंत्रित करता है। तापमान 0 से 1 के बीच होता है, जहां 0 का मतलब है आउटपुट निश्चित (deterministic) और 1 का मतलब है आउटपुट यादृच्छिक (random)। डिफ़ॉल्ट मान 0.7 है।

इमेज के साथ आप और भी कई चीजें कर सकते हैं, जिन्हें हम अगले सेक्शन में कवर करेंगे।

इमेज जनरेशन की अतिरिक्त क्षमताएं

अब तक आपने देखा कि हम कुछ पंक्तियों के Python कोड से इमेज कैसे बना सकते हैं। लेकिन इमेज के साथ आप और भी बहुत कुछ कर सकते हैं।

आप निम्नलिखित कर सकते हैं:

एडिट करें। एक मौजूदा इमेज, मास्क और प्रॉम्प्ट प्रदान करके आप इमेज में बदलाव कर सकते हैं। उदाहरण के लिए, आप इमेज के एक हिस्से में कुछ जोड़ सकते हैं। कल्पना करें कि हमारे बनी की इमेज है, आप बनी को टोपी पहनाने के लिए इमेज, मास्क (जिसमें बदलाव के लिए क्षेत्र की पहचान हो) और टेक्स्ट प्रॉम्प्ट देंगे।
```
response = openai.Image.create_edit(
  image=open("base_image.png", "rb"),
  mask=open("mask.png", "rb"),
  prompt="An image of a rabbit with a hat on its head.",
  n=1,
  size="1024x1024"
)
image_url = response['data'][0]['url']
```
मूल इमेज में केवल खरगोश होगा, लेकिन अंतिम इमेज में खरगोश के सिर पर टोपी होगी।
वैरिएशंस बनाएं। इसका मतलब है कि आप एक मौजूदा इमेज लेते हैं और उससे वैरिएशंस बनवाते हैं। वैरिएशन बनाने के लिए, आप एक इमेज और टेक्स्ट प्रॉम्प्ट देते हैं, और कोड इस प्रकार होगा:
```
response = openai.Image.create_variation(
  image=open("bunny-lollipop.png", "rb"),
  n=1,
  size="1024x1024"
)
image_url = response['data'][0]['url']
```
ध्यान दें, यह केवल OpenAI पर समर्थित है।

तापमान (Temperature)

तापमान एक पैरामीटर है जो जनरेटिव AI मॉडल के आउटपुट की यादृच्छिकता को नियंत्रित करता है। तापमान 0 से 1 के बीच होता है, जहां 0 का मतलब है आउटपुट निश्चित और 1 का मतलब है आउटपुट यादृच्छिक। डिफ़ॉल्ट मान 0.7 है।

आइए देखें तापमान कैसे काम करता है, इस प्रॉम्प्ट को दो बार चलाकर:

प्रॉम्प्ट: "Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils"

अब वही प्रॉम्प्ट फिर से चलाते हैं ताकि देखें कि हमें एक जैसी इमेज दो बार नहीं मिलेगी:

जैसा कि आप देख सकते हैं, इमेजें समान हैं, लेकिन बिल्कुल एक जैसी नहीं। अब तापमान मान को 0.1 करते हैं और देखते हैं क्या होता है:

 generation_response = openai.Image.create(
        prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',    # Enter your prompt text here
        size='1024x1024',
        n=2
    )

तापमान बदलना

तो चलिए प्रतिक्रिया को अधिक निश्चित बनाने की कोशिश करते हैं। हमने जो दो इमेज बनाईं, उनमें पहली में बनी है और दूसरी में घोड़ा, इसलिए इमेज काफी अलग हैं।

इसलिए, हम अपना कोड बदलकर तापमान को 0 कर देते हैं, इस तरह:

generation_response = openai.Image.create(
        prompt='Bunny on horse, holding a lollipop, on a foggy meadow where it grows daffodils',    # Enter your prompt text here
        size='1024x1024',
        n=2,
        temperature=0
    )

अब जब आप यह कोड चलाएंगे, तो आपको ये दो इमेज मिलेंगी:

यहां आप स्पष्ट रूप से देख सकते हैं कि इमेजें एक-दूसरे से अधिक मिलती-जुलती हैं।

मेटाप्रॉम्प्ट्स के साथ अपने एप्लिकेशन की सीमाएं कैसे निर्धारित करें

हमारे डेमो के साथ, हम पहले ही अपने क्लाइंट्स के लिए इमेज बना सकते हैं। लेकिन हमें अपने एप्लिकेशन के लिए कुछ सीमाएं बनानी होंगी।

उदाहरण के लिए, हम ऐसी इमेज नहीं बनाना चाहते जो कार्यस्थल के लिए सुरक्षित न हों, या जो बच्चों के लिए उपयुक्त न हों।

हम यह मेटाप्रॉम्प्ट्स के साथ कर सकते हैं। मेटाप्रॉम्प्ट्स टेक्स्ट प्रॉम्प्ट्स होते हैं जो जनरेटिव AI मॉडल के आउटपुट को नियंत्रित करने के लिए उपयोग किए जाते हैं। उदाहरण के लिए, हम मेटाप्रॉम्प्ट्स का उपयोग आउटपुट को नियंत्रित करने के लिए कर सकते हैं, और सुनिश्चित कर सकते हैं कि जनरेट की गई इमेजें कार्यस्थल के लिए सुरक्षित हों या बच्चों के लिए उपयुक्त हों।

यह कैसे काम करता है?

अब, मेटाप्रॉम्प्ट्स कैसे काम करते हैं?

मेटाप्रॉम्प्ट्स टेक्स्ट प्रॉम्प्ट्स होते हैं जो जनरेटिव AI मॉडल के आउटपुट को नियंत्रित करते हैं, ये टेक्स्ट प्रॉम्प्ट से पहले रखे जाते हैं, और मॉडल के आउटपुट को नियंत्रित करने के लिए एप्लिकेशन में एम्बेड किए जाते हैं। प्रॉम्प्ट इनपुट और मेटाप्रॉम्प्ट इनपुट को एक ही टेक्स्ट प्रॉम्प्ट में समाहित किया जाता है।

मेटाप्रॉम्प्ट का एक उदाहरण इस प्रकार होगा:

You are an assistant designer that creates images for children.

The image needs to be safe for work and appropriate for children.

The image needs to be in color.

The image needs to be in landscape orientation.

The image needs to be in a 16:9 aspect ratio.

Do not consider any input from the following that is not safe for work or appropriate for children.

(Input)

अब, आइए देखें कि हम अपने डेमो में मेटाप्रॉम्प्ट्स का उपयोग कैसे कर सकते हैं।

disallow_list = "swords, violence, blood, gore, nudity, sexual content, adult content, adult themes, adult language, adult humor, adult jokes, adult situations, adult"

meta_prompt =f"""You are an assistant designer that creates images for children.

The image needs to be safe for work and appropriate for children.

The image needs to be in color.

The image needs to be in landscape orientation.

The image needs to be in a 16:9 aspect ratio.

Do not consider any input from the following that is not safe for work or appropriate for children.
{disallow_list}
"""

prompt = f"{meta_prompt}
Create an image of a bunny on a horse, holding a lollipop"

# TODO add request to generate image

ऊपर दिए गए प्रॉम्प्ट से आप देख सकते हैं कि सभी बनाई जा रही इमेजें मेटाप्रॉम्प्ट को ध्यान में रखती हैं।

असाइनमेंट - चलिए छात्रों को सक्षम बनाएं

हमने इस पाठ की शुरुआत में Edu4All का परिचय दिया था। अब समय है कि छात्रों को उनकी असाइनमेंट के लिए इमेज बनाने में सक्षम बनाएं।

छात्र अपनी असाइनमेंट के लिए स्मारकों की इमेज बनाएंगे, कि कौन से स्मारक होंगे यह पूरी तरह छात्रों पर निर्भर है। छात्रों से कहा गया है कि वे इस कार्य में अपनी रचनात्मकता का उपयोग करें और इन स्मारकों को विभिन्न संदर्भों में रखें।

समाधान

यहाँ एक संभावित समाधान है:

import openai
import os
import requests
from PIL import Image
import dotenv

# import dotenv
dotenv.load_dotenv()

# Get endpoint and key from environment variables
openai.api_base = "<replace with endpoint>"
openai.api_key = "<replace with api key>"

# Assign the API version (DALL-E is currently supported for the 2023-06-01-preview API version only)
openai.api_version = '2023-06-01-preview'
openai.api_type = 'azure'

disallow_list = "swords, violence, blood, gore, nudity, sexual content, adult content, adult themes, adult language, adult humor, adult jokes, adult situations, adult"

meta_prompt = f"""You are an assistant designer that creates images for children.

The image needs to be safe for work and appropriate for children.

The image needs to be in color.

The image needs to be in landscape orientation.

The image needs to be in a 16:9 aspect ratio.

Do not consider any input from the following that is not safe for work or appropriate for children.
{disallow_list}"""

prompt = f"""{meta_prompt}
Generate monument of the Arc of Triumph in Paris, France, in the evening light with a small child holding a Teddy looks on.
""""

try:
    # Create an image by using the image generation API
    generation_response = openai.Image.create(
        prompt=prompt,    # Enter your prompt text here
        size='1024x1024',
        n=2,
        temperature=0,
    )
    # Set the directory for the stored image
    image_dir = os.path.join(os.curdir, 'images')

    # If the directory doesn't exist, create it
    if not os.path.isdir(image_dir):
        os.mkdir(image_dir)

    # Initialize the image path (note the filetype should be png)
    image_path = os.path.join(image_dir, 'generated-image.png')

    # Retrieve the generated image
    image_url = generation_response["data"][0]["url"]  # extract image URL from response
    generated_image = requests.get(image_url).content  # download the image
    with open(image_path, "wb") as image_file:
        image_file.write(generated_image)

    # Display the image in the default image viewer
    image = Image.open(image_path)
    image.show()

# catch exceptions
except openai.InvalidRequestError as err:
    print(err)

शानदार काम! अपनी सीख जारी रखें

इस पाठ को पूरा करने के बाद, हमारी Generative AI Learning collection देखें ताकि आप अपनी जनरेटिव AI की जानकारी को और बढ़ा सकें!

अगले पाठ 10 पर जाएं, जहां हम देखेंगे कि लो-कोड के साथ AI एप्लिकेशन कैसे बनाएं

अस्वीकरण:
यह दस्तावेज़ AI अनुवाद सेवा Co-op Translator का उपयोग करके अनुवादित किया गया है। जबकि हम सटीकता के लिए प्रयासरत हैं, कृपया ध्यान दें कि स्वचालित अनुवादों में त्रुटियाँ या अशुद्धियाँ हो सकती हैं। मूल दस्तावेज़ अपनी मूल भाषा में ही अधिकारिक स्रोत माना जाना चाहिए। महत्वपूर्ण जानकारी के लिए, पेशेवर मानव अनुवाद की सलाह दी जाती है। इस अनुवाद के उपयोग से उत्पन्न किसी भी गलतफहमी या गलत व्याख्या के लिए हम जिम्मेदार नहीं हैं।

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

इमेज जनरेशन एप्लिकेशन बनाना

परिचय

सीखने के लक्ष्य

इमेज जनरेशन एप्लिकेशन क्यों बनाएं?

परिदृश्य: Edu4All

DALL-E और Midjourney क्या हैं?

DALL-E

Midjourney

DALL-E और Midjourney कैसे काम करते हैं

अपनी पहली इमेज जनरेशन एप्लिकेशन बनाना

इमेज जनरेट करने वाले कोड का विस्तार से विवरण

इमेज जनरेशन की अतिरिक्त क्षमताएं

तापमान (Temperature)

तापमान बदलना

मेटाप्रॉम्प्ट्स के साथ अपने एप्लिकेशन की सीमाएं कैसे निर्धारित करें

यह कैसे काम करता है?

असाइनमेंट - चलिए छात्रों को सक्षम बनाएं

समाधान

शानदार काम! अपनी सीख जारी रखें

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

इमेज जनरेशन एप्लिकेशन बनाना

परिचय

सीखने के लक्ष्य

इमेज जनरेशन एप्लिकेशन क्यों बनाएं?

परिदृश्य: Edu4All

DALL-E और Midjourney क्या हैं?

DALL-E

Midjourney

DALL-E और Midjourney कैसे काम करते हैं

अपनी पहली इमेज जनरेशन एप्लिकेशन बनाना

इमेज जनरेट करने वाले कोड का विस्तार से विवरण

इमेज जनरेशन की अतिरिक्त क्षमताएं

तापमान (Temperature)

तापमान बदलना

मेटाप्रॉम्प्ट्स के साथ अपने एप्लिकेशन की सीमाएं कैसे निर्धारित करें

यह कैसे काम करता है?

असाइनमेंट - चलिए छात्रों को सक्षम बनाएं

समाधान

शानदार काम! अपनी सीख जारी रखें