ChatGPT + DALL-E, finally

This entry is part 8 of 11 in the series Artificial Intelligence

My very first post in this series tried to see what I could do by combining the language processing of ChatGPT to generate prompts for DALL-E (complete with quirky keywords that make sense to robots (I guess) but not humans). The disappointing answer at the time was that ChatGPT didn’t know about DALL-E, and therefore was not effective at building good prompts for it.

Here we are, today, less than a year later, and that’s no longer true.

What does it mean?

The best way of learning the robot’s capabilities is not necessarily asking it directly, but occasionally you can do that and it will give you decent information, so I started there.

Me: I see that you are now equipped with DALL-e 3 capabilities, but I’m not sure what that means, exactly. How are you able to interact with Dall-e? Does that mean you can interpret more complex prompts for image generation than what’s possible with Dall-e itself?

ChatGPT: Yes, you’re right! With the integration of DALL·E 3 capabilities, I can create images based on text descriptions you provide. Here’s how it works:

  1. Description: You provide a text description of the image you’d like to see. It can be anything from a simple object to a complex scene.
  2. Processing: I interpret your description and, if necessary, refine it to ensure it aligns with the image generation capabilities and policies. I can create up to four different captions to generate diverse images based on your description.
  3. Image Generation: I send the processed description to DALL·E 3, which then generates the image(s) based on the caption(s).
  4. Delivery: You receive the generated image(s) based on your description.

The integration enhances the user experience by allowing me to handle more intricate and nuanced prompts, ensuring they’re transformed into appropriate and detailed captions for DALL·E 3 to interpret. This means you can be as imaginative as you want, and I’ll do my best to translate your vision into captivating images!

Would you like to give it a try?

I hadn’t tested the new DALL-E functionality out yet, so I figured I’d start with something I’ve done already, namely, the D&D character I’ve been working on and am now playing in a campaign my son is running.

photorealistic digital photograph of an attractive olive bronze skinned male desi tiefling with dark hair and purple eyes and spiraling horns

Taresh Rajakumar, my tiefling D&D character

To start, I gave ChatGPT the rundown:

I have an image that I like for a character I am playing in a Dungeon’s and Dragons campaign. the original prompt that I used to generate the image is “photorealistic digital photograph of an attractive olive bronze skinned male desi tiefling with dark hair and purple eyes and spiraling horns trending on artstation”

The character’s name is Taresh Rajakumar and the setting is broadly Eberron, but with some interplanar connections to Wildemount. Here is his backstory:

I then pasted the entirety of the (fairly extensive) notes I have about him, his backstory, his abilities, his relationships, etc. ChatGPT turned that into prompts and created a series of images, and, let me tell you, the initial results were really good.

We went back and forth a bit, making tweaks to his appearance, the background, his cards, his features, his pose, until I finally got this:

Sidenote: Along with this one, came this option, which is too hot to not include, even if it wasn’t exactly what I wanted:

Each of these images is generated from a prompt that ChatGPT comes up with. When you click on the image, you see (and can copy) the prompt. For example, hot-boy Taresh had the prompt: “Fantasy portrait of Taresh Rajakumar exhibiting energy and motion. This tiefling, with features echoing southeast Asian descent, has bronze-hued skin, dark hair, mesmerizing purple eyes, and ornate horns. In an animated pose, wearing a lustrous purple jacket, he flourishes a luminous card from his arcane deck. The luxurious surroundings highlight his noble background.” These prompts are a lot longer than what you are allowed to do with the DALL-E UI (at least last I used it), so one benefit of doing this via ChatGPT is the ability to have more complex prompts.

Can ChatGPT tell us how DALL-E works?

Once I got the image I was looking for, I decided to take a step back and revisit some of the questions I had a year ago when I was trying to get ChatGPT to give me good prompts for DALL-E. Would it know any better this time around since ChatGPT is the one creating the prompts?

Me: previously, with DALL-E and on blogs like AI Weirdness I have observed a noticeable difference in how slight prompt tweaks affect how a produced image looks. Specifically the phrase “trending on artstation” had a significant, observable difference in what was produced as compared to prompts that did not include “trending on artstation”. At the time, I asked an earlier iteration of you if you could help me craft DALL-E prompts with these “secret phrases” to produce better results. I’m wondering if you can speak to that phenomenon now that DALL-E is integrated into ChatGPT. Are there specific phrases that will change the quality of the images that might not be explicitly obvious?

ChatGPT: Certainly! The phenomenon you’re referring to is quite fascinating. In DALL·E’s training data, certain phrases might be associated with specific styles, qualities, or trends, leading to different outputs even with slight prompt variations. For example, the phrase “trending on artstation” could guide DALL·E to generate images that match what’s considered high-quality or popular on platforms like ArtStation, which is known for professional digital art.

Here are some insights:

  1. Qualifiers: Adding qualifiers like “photorealistic,” “stylized,” “cartoonish,” or “minimalistic” can drastically change the style of the output.
  2. Platform-specific phrases: As you’ve observed, mentioning a specific platform or publication like “ArtStation,” “DeviantArt,” “The New Yorker,” etc., can alter the resulting style to match what’s typical or trending on that platform.
  3. Historical or cultural references: Mentioning a specific era, like “Renaissance” or “Cyberpunk,” or a cultural reference, can guide the image towards that theme.
  4. Explicitness: The more explicit and detailed the description, the more closely the generated image will likely match the prompt. For example, specifying lighting, mood, or perspective can give you more control over the result.
  5. Trending phrases: Sometimes, certain phrases become associated with high-quality or specific styles due to trends in the training data. These can be more experimental and may change over time.

While these insights can guide image generation, it’s essential to note that DALL·E’s behavior can sometimes be unpredictable due to the vast and diverse training data. Experimentation is key, and slight tweaks to prompts can lead to vastly different and exciting results.

Would you like to experiment further with prompt crafting or have any other questions?

I decided to dive down the rabbit hole, using Taresh as my guinea pig, to test out different styles. The results were…interesting. Since The New Yorker was specifically referenced, I started there.

That just looked like a sort of cartoony version of him, so I asked what made that style unique to The New Yorker.

“The New Yorker” is known for its distinctive illustration style that has evolved over the decades. While there’s no single “New Yorker style” due to the variety of artists who have contributed over the years, there are some common characteristics and qualities often associated with the magazine’s illustrations:

  1. Simplicity and Elegance: Illustrations often have a clean, refined look with a clear focal point. They tend to avoid overly complex or busy designs.
  2. Subtle Color Palettes: The colors used are often muted and sophisticated, steering clear of overly bright or flashy hues.
  3. Expressive Lines: The artwork often has a hand-drawn quality with expressive, flowing lines.
  4. Narrative Depth: Many illustrations convey a deeper story or message, often with a touch of humor, irony, or social commentary.
  5. Stylized Characters: Characters in the illustrations might be slightly exaggerated or stylized, but in a way that’s more whimsical than cartoonish.
  6. Timelessness: The style often feels timeless, straddling both contemporary and classic aesthetics.
  7. Cultural Relevance: Given the magazine’s emphasis on culture, many illustrations touch on current events, societal trends, or urban life.

For the image of Taresh Rajakumar styled after “The New Yorker,” you might notice some of these elements, such as a refined and clean design, a subtle color palette, expressive lines, and a certain elegance that aligns with the magazine’s aesthetic.

Fair enough, can we put Taresh on the cover of Wired?

This one was interesting, because you can see that DALL-E is getting better about text. It almost nails the Wired logo at the top, and it almost gets his name at the top right!

How about Home & Garden?

How about a steampunk Taresh?

For my last experiment, I wanted to play with a whim. In a lot of these pictures, Taresh…kinda looked like Prince.

…so maybe I should just go with that. With some back and forth, here’s what ChatGPT + DALL-E came up with when it cast Taresh as Prince in Purple Rain.

Why is this cool?

The most interesting part of this process is the collaborative element of it. ChatGPT conversations are often just that — conversations. You ask something, the robot responds to the best of its ability, and you’re able to correct it as you go, ultimately creating a dialogue with the goal of producing whatever it is that you’re looking to accomplish.

DALL-E was distinct from that. You feed it a prompt, you get four results, and the only thing you can do is adjust your prompt to try to get something better. By using ChatGPT in the middle, you are better able to fine-tune the DALL-E prompts without having to know the magic prompt generating techniques. The robot takes care of that for you and you can simply say “I want it more like x” or “this one has too much y, can we make this adjustment.” Regardless of whether the AI actually understands what you have in mind, it’s able to process your language and make inferences based on it’s training data to produce a result that’s more in line with what you’re going for.

And if you really want to, you can take the prompts yourself — which it provides you — and say “actually, can we use a prompt like n” and combine elements from different prompts that seemed to work.

I’m not sure why anyone would want to use DALL-E as a standalone tool at this point, since — based on my initial experiments — the results you get with DALL-E + ChatGPT are so much better than anything I was able to get from DALL-E directly.

Series Navigation← ChatGPT: I will get back to you…Comparing AI models: The “what band am I thinking of?” version →

Posted

in

by

Tags:

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.