Generating Detailed Images with OpenAI DALL-E and ChatGPT in Python: A Step-By-Step API Tutorial

In this article, we will explore how to automate the creation of AI-generated art by integrating DALL-E with ChatGPT using the respective APIs in Python. ChatGPT, the state-of-the-art language model developed by OpenAI, has recently made waves in the tech community for its exceptional language abilities, such as code generation, prompt answering, and text completion. DALL-E, another powerful language model developed by OpenAI, specializes in generating images from text prompts. This tutorial will utilize the OpenAI GPT3-API to generate a detailed and specific prompt for DALL-E. We will then use the prompt in a request to the DALL-E API to generate images. The generated images will be displayed and saved for future use.

Throughout the tutorial, we will provide clear explanations and code snippets to guide you through the process. By the end of this tutorial, you will have a comprehensive understanding of how to automate prompt generation for DALL-E with Python. So, let’s get started!

If you’re new to the OpenAI API, my recent API tutorial on ChatGPT and other OpenAI language models might be helpful to check out.

Also: Mastering Prompt Engineering for ChatGPT for Business Use

Images created with automated prompt generation for OpenAI Dall-E using ChatGPT in Python
Images created with automated prompt generation for OpenAI DALL-E using ChatGPT in Python

Generating AI-Art using DALL-E: How it Works

DALL-E is a language model developed by OpenAI that is capable of generating images from text prompts. It uses deep learning techniques to understand the input text and generate images that are related to the text. DALL-E is trained on a massive dataset of images and texts, allowing it to generate a wide range of images. This allows the model to take text prompts as input and generate images that are related to the prompt.

The input to a language model is what we call a prompt. A prompt to GPT typically includes a general instruction, a specific topic, and additional keywords (for example, digital art, oil painting, etc.). The DALL-E model then uses this information to generate the images. Usually, four different images are generated per request. DALL-E can do various other things, such as completing or altering existing images. However, this article will focus on image generation.

The images generated by DALL-E can be of various types, such as illustrations, drawings, photographs, etc. While the quality of the images is generally surprisingly good, the model can do some things, certainly better than others. For example, you will find that human faces and bodies sometimes lead to odd results. But, in general, images are often surprisingly coherent with the provided text prompts.

AI-generated images can be useful for various applications. These include creating AI-generated art, creating illustrations for books, creating images for social media, creating product images for e-commerce, and more. In many cases, generating images from text prompts can save time and resources, as it eliminates the need for manual image creation.

UI of the OpenAI DALL-E2 service for AI-generated images. The AI services generate images based on a manual prompt defined by the user.
UI of the OpenAI DALL-E2 service for AI-generated images. The AI services generate images based on a manual prompt defined by the user.

Automated DALL-E Prompts using ChatGPT

he usual way to generate images with DALL-E is via a manual prompt on the OpenAI website. However, DALL-E also offers an API, which allows for automating image generation. Of course, you could send a manually written prompt to the API and automatically process the response. However, if you want to optimize the quality of the generated images or consistently automate the whole process, there is a better way of doing this using ChatGPT.

You may have heard of ChatGPTs abilities to complete and generate high-quality text. However, few people know that ChatGPT can also generate prompts for DALL-E. This works by simply telling the model to generate a prompt for DALL-E and then specifying the topic for which you want to create the prompt. An example prompt could be:

"generate a prompt for DALL-E on a robot on the beach".

And the response from ChatGPT:

The gleeful robot lounged on the sun-drenched beach, soaking up the warm rays and listening to the soothing crash of the waves. It wore a bright, multicolored swimsuit and a wide-brimmed hat to protect its circuitry from the intense heat. It smiled contentedly as it watched the seagulls soar through the azure sky and the cheerful children playing in the foamy surf. The salty-sweet scent of the sea filled its senses, and it felt truly relaxed and rejuvenated.

As you can see, the response is very detailed and uses a lot of adjectives. As a result, the images generated with these prompts are often highly creative and detailed. Below is the result from DALL-E for this specific prompt:

Example DALL-E creation for a ChatGPT-generated prompt using the command: "generate a prompt for DALL-E on a robot on the beach."
Example DALL-E creation for a ChatGPT-generated prompt using the command: “generate a prompt for DALL-E on a robot on the beach.”

Why You May Want to Automate Prompt Generation for DALL-E

Automating prompt generation for DALL-E has several benefits. Some of the reasons why you may want to automate prompt generation include the following:

  • Efficiency: Automating the prompt generation process can save time and resources as it eliminates the need for manual input.
  • More Details: The prompts generated by ChatGPT are typically more detailed than what humans typically use to generate images.
  • Consistency: By using a language model like ChatGPT to generate prompts, you can ensure that the prompts are grammatically correct and well-formed, which can improve the quality of the generated images.
  • Variety: By using ChatGPT to generate prompts, you can include additional keywords, making the generation more diverse and less repetitive.
  • Automation: When incorporating AI-generated images into an integrated process, utilizing APIs to automate the process is essential. For instance, an integration with Twitter can be implemented where ChatGPT automatically picks up keywords from tweets and generates images based on those keywords, which can be published on Twitter.

Using ChatGPT to generate DALL-E prompts allows for more efficient and accurate image generation and the ability to generate more detailed images. It also reduces the need for manual input and allows you to integrate image generation models into complex processes.

The images generated based on the ChatGPT prompts are often superior in details and creativity.
The images generated based on the ChatGPT prompts are often superior in detail and creativity.

Automated Dall-E Prompt Generation using ChatGPT in Python

In the following, we will generate a Python script that integrates DALL-E with ChatGPT to create AI-generated images from keywords or short descriptions.

Here are the general steps involved in generating DALL-E prompts using ChatGPT in Python:

  1. To use the OpenAI models, we will first need to authenticate with the OpenAI API by providing our API key. In the next section, we will look at how you can register for a key. We will also briefly discuss the costs of using OpenAI models.
  2. Define a ChatGPT Prompt: An image prompt is the text input that the DALL-E model uses to generate a response. We will use ChatGPT to generate this prompt
  3. Generate a Prompt Design with ChatGPT: Generate a response: Once we have a prompt, we can use the GPT-3 model to generate a prompt for DALL-E.
  4. Send the prompt to DALL-E API: The response obtained from the above step is sent to DALL-E API to generate the images.
  5. Process the Image Response from DALL-E: Once we have the image from the DALL-E API, we print the images and save them to a local folder using.

Let’s get started!

Register for an OpenAI API Key

To use the OpenAI API, you will first need to register for an API key by visiting the OpenAI website and creating an account. During the registration process, you will be required to provide some basic information about yourself and the project you are working on. In addition, you will need to add a payment method. The cost per request is a couple of cents, depending on which model you use. In this tutorial, we will use the Davinci model, which is 0.02$ per 1000 tokens.

It’s important to note that while GPT-3 is currently available in a free test version, the OpenAI API itself is not free. If you only plan to send a few test requests, the costs will be minimal, but if you integrate the API with a successful application that runs in production, the costs can quickly accumulate.

Each language model offered by OpenAI has a different price tag, and charges depend on various factors. For language models, charges are based on the number of tokens sent to the model and the type of the model.

Prices for image models depend on the resolution at which you generate the images.

I recommend monitoring usage and keeping track of costs to avoid unexpected charges. To manage costs, you can set up a quota on the costs in the OpenAI portal under your profile. This will help you to keep an eye on the costs and keep them within your budget.

Prices of different language models offered by OpenAI.
Overview of Prices for OpenAI language models (as of 2023-22-01).
Overview of prices for OpenAI image models (as of 2023-22-01).

Technical Setup

Before diving into the code, it’s essential to ensure that you have the proper setup for your Python 3 environment and have installed all the necessary packages. If you do not have a Python environment, you can follow the instructions in this tutorial to set up the Anaconda Python environment. This will provide you with a robust and versatile environment that is well-suited for machine learning and data science tasks.

Before you can start generating DALL-E prompts, you’ll need to install the OpenAI library for Python, which provides access to the GPT-3 model. You can do this by running “pip install openai” in your command line.

In addition, this tutorial will work with matplotlib and PIL and standard libraries such as yaml, os, and datetime. You can install the OpenAI Python library using console commands:

  • pip install <package name>
  • conda install <package name> (if you are using the anaconda packet manager)

Step #1 Imports and Autenthication at the OpenAI API

To begin, we import the required libraries and provide our API key for authorization. I recommend to store the API key in a separate file, such as api_config_openai.yml. The code below will read the API key from this file and make it accsessible in the code. You can also place the API key directly in the code, but be mindful not to make it publicly accessible.

With the API key set up, you can use the OpenAI API’s “Model.list()” function to retrieve a list of available models. For this you have to uncomment the last four lines in the following code snippet.

import openai
import yaml
import urllib.request
from PIL import Image
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import os
from datetime import datetime
# set the API Key 
yaml_file = open('API Keys/api_config_openai.yml', 'r')  
p = yaml.load(yaml_file, Loader=yaml.FullLoader)
openai.api_key = p['api_key']
# show available openai language models (this tutorial uses davinci003)
# modellist = openai.Model.list()
# for i in modellist.data:
#     print(i.id)

Step #2 Define a ChatGPT Prompt

Next, we define the prompt for ChatGPT in which we request a prompt for DALL-E.

Defining Prompts

When we define prompts for ChatGPT there are a few things to keep in mind. The primary focus should be to provide a clear and specific task or question. Although the AI can still function with incomplete or incorrect information, providing it with detailed instructions will improve its performance. In addition, keyword relevance is essential.

To ensure the AI tool produces the desired results, it’s vital to use relevant keywords in your input. The tool must first understand the input accurately before it can generate the expected output. A well-crafted prompt can improve the tool’s performance and accelerate your progress.

DALL-E also knows certain keywords, that will send the model in the one or the other direction. For example, you can define the type of image you want to be generated by adding keywords such as oil paining, aqurael painting, digital art, or Van-Gogh style. This article provides a good overview of these keywords.

Send the Request to the OpenAI Language Model

We encapsulate this request in a function called “send_openai_request” that takes in three parameters: “engine,” “prompt,” and “max_tokens”. The function uses the OpenAI API’s “Completion.create()” method to send a request to the specified engine with the provided prompt and maximum token limit.

In the code below, we have created a simple start phrase called “prompt_base”: “Generate a detailed Dall-E prompt with several adjectives for”. You can then simply add the topic for which you want to generate the images as “prompt_details.” Alternatively, you can append “additional_keywords” to the prompt that will be added after the language model has generated the prompt for DALL-E.

There are different language models available but the one that creates the most detailed results and is closest to ChatGPT is “text-davinci-003”. This is the version used in this tutorial. Finally, we send the request and print out the response with the generated prompt.

The prompt is constructed using a base prompt, which is a general instruction to the model, a specific topic “sugar castle” and additional keywords “digital art”. The function “send_openai_request” is defined to handle the request to the language model, it takes three parameters “engine” (model version), “prompt” (the instruction for the model) and “max_tokens” (maximum number of tokens in the response). We are sending a request to the OpenAI API to generate a response, which is returned and stored in the variable “response”.

# define the request
def send_openai_request(engine, prompt, max_tokens=1024):
    response = openai.Completion.create(
        engine=engine,
        prompt=prompt,
        max_tokens=max_tokens,
        n=1,
        stop=None,
        temperature=0.7
    )
    return response
# define the prompt to the language model
prompt_base = "Generate a detailed Dall-E prompt with several adjectives for " # an introduction text telling the language model what to do
prompt_details = "sugar castle" # the topic for which you wish to generate the images 
additional_keywords = ",digital art" # these keywords will be added after the language model generated the prompt. Example: "digital art", "oil painting", "water color painting", "high quality"
model="text-davinci-003" # the version of the openai language model
# generate a response
response = send_openai_request(model, prompt_base + prompt_details)
# print the response from the language model
generated_prompt = response["choices"][0]["text"]
print(generated_prompt)
Make me a picture of a majestic, shimmering, sparkling sugar castle with a dazzling crystal spire and enchanting turrets, surrounded by an emerald green moat and a towering rainbow-hued wall.

Step #3 Generate a Prompt Design with ChatGPT

Next, we specify the parameters for generating images using the OpenAI DALL-E API. We have set the variable “number_of_images” to 2, which means it will generate 2 images. In addition, we set the “image_size” to “512×512,” which is the size of the images to be generated.

We create the final prompt to DALL-E called “image_generation_prompt” by combining the “generated_prompt” variable obtained from the previous text prompt and the “additional_keywords” variable.

Finally, we are printing out a message indicating that DALL-E will generate the specified number of images at the specified size, using the provided prompt.

# image parameters
number_of_images = 2 # how many images you want to generate
image_size = "512x512" # the size of the images
image_generation_prompt = f"{generated_prompt} {additional_keywords}"
print(f"Dall-e will generate {number_of_images} images {image_size} using the following prompt: {image_generation_prompt}")
Dall-e will generate 2 images 512x512 using the following prompt: 
Make me a picture of a majestic, shimmering, sparkling sugar castle with a dazzling crystal spire and enchanting turrets, surrounded by an emerald green moat and a towering rainbow-hued wall. digital art

Step #4 Send the Generated Prompt to the DALL-E API

Next, we send the request to the OpenAI DALL-E API using the “image_generation_prompt” variable created earlier. We are sending the request using OpenAI’s “Image.create()” method, which takes the prompt, number of images, and size as parameters. The response from the API is stored in the “response” variable.

We are looping through the response data and appending the URLs of each image in the list. Finally, we call the function get_images on the response, storing the resulting image URLs in the “image_list” variable and displaying the image URLs in the output.

# define and send the request to dall-e with the generated prompt
response = openai.Image.create(
    prompt=image_generation_prompt,
    n=number_of_images,
    size=image_size,
)
# set the timestamp for data processing
timestamp_string = response.created
datetime_string = datetime.fromtimestamp(timestamp_string).strftime("%Y%m%d%H%M%S")
# get the image(s) from the response
def get_images(response):
    # generate an empty list for the image urls
    image_list = []
    # store the image urls in the list
    for imgurl in response.data:
        image_list.append(imgurl.url)
    return image_list
image_list = get_images(response)
#display image urls
print(image_list)
['https://oaidalleapiprodscus.blob.core.windows.net/private/org-eO72e4aFm9XJBw4sb91Z8XEX/user-9ZRwLxYMDBxvw6gLEywP44xa/img-BdkqLm65g8QLIpLbgHhDkxbf.png?st=2023-01-22T19%3A03%3A51Z&se=2023-01-22T21%3A03%3A51Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-22T17%3A19%3A13Z&ske=2023-01-23T17%3A19%3A13Z&sks=b&skv=2021-08-06&sig=vQrKcGSKmxXKDikYpx9xvmHeBRcdQoxRH%2B8%2BjYogjz8%3D', 'https://oaidalleapiprodscus.blob.core.windows.net/private/org-eO72e4aFm9XJBw4sb91Z8XEX/user-9ZRwLxYMDBxvw6gLEywP44xa/img-YnicWoD32S1DBBYzyhMnEHjo.png?st=2023-01-22T19%3A03%3A51Z&se=2023-01-22T21%3A03%3A51Z&sp=r&sv=2021-08-06&sr=b&rscd=inline&rsct=image/png&skoid=6aaadede-4fb3-4698-a8f6-684d7786b067&sktid=a48cca56-e6da-484e-a814-9c849652bcb3&skt=2023-01-22T17%3A19%3A13Z&ske=2023-01-23T17%3A19%3A13Z&sks=b&skv=2021-08-06&sig=o2kK5GIKX7lIqUaFQs8e1Sa3JWRyEZr6cVfKYPAjH%2BY%3D']

Step #5 Process the Image Response from DALL-E

Now that we have the image URLs, let’s see what DALL-E has generated. We use the Matplotlib library to display the generated images. Then we define a save path for the images using the timestamp and the details of the prompt. As the following code iterates over the URLs of the images, it stores the images with a unique filename for each image and saves them in the created directory.

# display the images
fig, axs = plt.subplots(nrows=1, ncols=len(image_list), figsize=(10, 10))
for i, imgurl in enumerate(image_list):
    ax = axs[i]
    img = mpimg.imread(imgurl)
    imgplot = ax.imshow(img)
    ax.set_xticks([]); ax.set_yticks([])
# define and create the save path
save_path = f"dall-e_images/{datetime_string}_{prompt_details.replace(' ', '-')}"
os.makedirs(save_path)
print(f"images stored under the following path: {save_path}")
# store the images
for i, imgurl in enumerate(image_list):
    # set the file name
    filename = f"{datetime_string}_dall-e{i}.png"
    # save the image
    img = mpimg.imread(imgurl)
    mpimg.imsave(f'{save_path}/{filename}', img)
images stored under the following path: dall-e_images/20230122210351_sugar-castle
Automatically generated AI-art using dall-e and openai chatgpt via Python APIs.

Wow, what a beautiful sugar castle! That’s it, now the images are stored on your local computer, and you can process them further.

Summary

Generating DALL-E prompts using ChatGPT in Python is a powerful and flexible way to create unique images from text prompts. By following the steps outlined in this tutorial, you can use the OpenAI library and GPT-3.5 model to create AI-generated images for various topics. This process can be repeated with different prompts to generate a wide variety of images.

You can now experiment with the latest advancements in natural language processing and image generation to create your own unique and captivating images.

I hope this article has provided a useful introduction to working with OpenAI’s language models in Python and that you will continue to explore the full range of capabilities offered by the API.

If you have any questions, let me know in the comments.

Sources and Further Readings

Author

  • Florian Follonier

    Hi, I am Florian, a Zurich-based Cloud Solution Architect for AI and Data. Since the completion of my Ph.D. in 2017, I have been working on the design and implementation of ML use cases in the Swiss financial sector. I started this blog in 2020 with the goal in mind to share my experiences and create a place where you can find key concepts of machine learning and materials that will allow you to kick-start your own Python projects.

    View all posts
0 0 votes
Article Rating
Subscribe
Notify of

0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x