Table of Contents
DALL-E is a neural network introduced by OpenAI in January 2021, capable of creating images from text descriptions. Over time, OpenAI has refined this technology, resulting in DALL E 2, released in April 2022, and the more recent DALL E 3, launched in September 2023.
In this article, we’ll delve into the primary distinctions between DALL E 2 and DALL E 3, and how these differences influence the quality and variety of generated images.
The most striking disparity between DALL E 2 and DALL E 3 is the resolution of the images they produce. DALL E 2 generates images at a resolution of 512×512 pixels, a significant improvement over the original DALL-E’s 256×256 pixels. However, DALL E 3 takes it further, offering an impressive 1024×1024 pixel resolution, enabling more detailed and realistic images.
Another significant contrast lies in the method of image synthesis. DALL E 2 employs a discrete variational autoencoder (VAE) for image compression and decompression into discrete latent codes. In contrast, DALL E 3 utilizes a diffusion model, allowing it to create images from noise by reversing a stochastic process. This shift to diffusion models enhances flexibility and expressive capabilities, making DALL E 3 adept at handling complex scenes and textures.
3. DALL E 3 ChatGPT Integration
DALL-E 3 features integration with ChatGPT, OpenAI’s conversational AI system, capable of generating natural language responses based on user input. This synergy empowers users to utilize ChatGPT for brainstorming and refining prompts for DALL E 3.
Users can engage ChatGPT to articulate their ideas, from simple sentences to detailed paragraphs, and ChatGPT will craft tailored prompts for DALL E 3 to bring those ideas to life. Moreover, users can request minor adjustments to generated images via ChatGPT, offering greater creative control.
4. Prompt Adherence
DALL E 3 excels at faithfully adhering to complex prompts, a challenge for its predecessor, DALL E 2. For instance, DALL E 3 can accurately depict scenes with specific objects and their relationships, such as “a cat sitting on a couch next to a lamp” or “a blue car parked in front of a red house.” In contrast, DALL E 2 occasionally misinterprets or ignores certain prompt elements, necessitating user expertise in prompt engineering.
5. Text Generation
DALL E 3 showcases significant improvements in generating text within images, such as labels, signs, logos, or captions. It produces legible, contextually relevant text consistent with image content and style. For example, when prompted for “a poster for a movie called The Matrix,” DALL E 3 generates text matching the font, color, and layout of the original movie poster. In contrast, DALL E 2 often generates blurry, irrelevant, or inconsistent text.
6. Human Details
DALL E 3 elevates the rendering of human details, including faces, hands, hair, and clothing. It creates realistic and diverse human faces with various expressions, poses, angles, and lighting conditions.
Additionally, it generates authentic human hands with different gestures, orientations, and accessories, along with realistic hair and clothing options. In contrast, DALL E 2 struggles with these aspects, sometimes producing distorted or unnatural results.
7. Engaging Images
DALL E 3 stands out by default in generating engaging images, eliminating the need for hacks or prompt engineering. It crafts images that are creative, humorous, surprising, or emotionally resonant without explicit user instructions.
For example, a prompt for “a cute dog” can yield images of dogs with various expressions, poses, accessories, or scenarios that evoke cuteness. In contrast, DALL E 2 often generates images lacking such inherent engagement.
8. Safety Mitigations
DALL-E 3 incorporates more extensive safety mitigations compared to DALL-E 2 to prevent the generation of harmful content. It can decline requests for violent, adult, hateful, or political content, as well as requests for images of public figures by name.
Safety enhancements in areas such as the generation of public figures and the mitigation of harmful biases related to visual representation have been developed in collaboration with domain experts and red teamers. DALL E 2, with fewer safety measures, is more susceptible to generating inappropriate or offensive images.
9. Provenance Classifier
DALL E 3 introduces a provenance classifier, an internal tool designed to identify whether an image was generated by DALL E 3. This tool serves multiple purposes, aiding OpenAI in understanding potential uses and abuses of generated images and assisting users in verifying the authenticity of generated images and their sources.
DALL-E 2 lacks such a tool, making it more susceptible to misuse or deception.
10. Creative Control
DALL E 3 emphasizes respect for the creative rights of living artists and creators. It will decline requests for images in the style of living artists, and creators can opt their images out from being used in training future image generation models by OpenAI.
DALL E 2 lacks these features, potentially generating images that infringe on the intellectual property or moral rights of living artists or creators.
DALL-E 3 represents the latest evolution in text-to-image generation from OpenAI, pushing boundaries in detail, prompt adherence, text generation, human details, engagement, safety, provenance, and creative control.
Integrated with ChatGPT, it offers a seamless interaction experience, enabling users to refine their prompts and images through natural language. Currently, in the research preview, DALL-E 3 will be available to ChatGPT Plus and Enterprise customers via the API in October, with a broader release planned for later this fall.