Text-to-Image · Google Research
Imagen: Why Text Understanding Matters for Image Generation
Imagen showed that stronger language encoders can materially improve text-to-image diffusion models, especially for prompt alignment and photorealism.
Topics
Models that generate or edit images from natural-language prompts.
Text-to-Image · Google Research
Imagen showed that stronger language encoders can materially improve text-to-image diffusion models, especially for prompt alignment and photorealism.
DALL·E 2 splits text-to-image generation into a prior that predicts a CLIP image embedding and a decoder that turns that embedding into an image.
Latent diffusion moves denoising from pixel space into a compressed autoencoder latent space, making high-resolution image generation far cheaper while preserving flexibility.