Topics

Text-to-Image

Models that generate or edit images from natural-language prompts.

Imagen: Why Text Understanding Matters for Image Generation

Imagen showed that stronger language encoders can materially improve text-to-image diffusion models, especially for prompt alignment and photorealism.

Text-to-Image · OpenAI

DALL·E 2: Text-to-Image Generation Through CLIP Latents

DALL·E 2 splits text-to-image generation into a prior that predicts a CLIP image embedding and a decoder that turns that embedding into an image.

Diffusion Models · CompVis

Latent Diffusion: The Paper Behind Practical High-Resolution Image Generation

Latent diffusion moves denoising from pixel space into a compressed autoencoder latent space, making high-resolution image generation far cheaper while preserving flexibility.