Topics

Diffusion Models

Generative models that synthesize data through iterative denoising.

Layered waves and soft texture suggesting iterative image synthesis

Diffusion models changed image generation by turning synthesis into iterative denoising. Instead of generating pixels in one step, the model learns how to reverse a corruption process, which gives strong control over fidelity, diversity, conditioning, and later editing workflows.

The key SEO distinction is that diffusion is not only a text-to-image trick. Latent Diffusion made high-resolution generation practical by moving denoising into compressed latent space. Imagen showed that text understanding is a major driver of prompt alignment. DALL-E 2 connected language-image representations with generation. Together these papers explain why modern creative AI is built around both denoising and strong conditioning.

Start here

Text-to-Image · Google Research

Imagen: Why Text Understanding Matters for Image Generation

Imagen showed that stronger language encoders can materially improve text-to-image diffusion models, especially for prompt alignment and photorealism.

Text-to-Image · OpenAI

DALL·E 2: Text-to-Image Generation Through CLIP Latents

DALL·E 2 splits text-to-image generation into a prior that predicts a CLIP image embedding and a decoder that turns that embedding into an image.

Diffusion Models · CompVis

Latent Diffusion: The Paper Behind Practical High-Resolution Image Generation

Latent diffusion moves denoising from pixel space into a compressed autoencoder latent space, making high-resolution image generation far cheaper while preserving flexibility.

Foundational papers

Diffusion Models · CompVis

Latent Diffusion: The Paper Behind Practical High-Resolution Image Generation

Latent diffusion moves denoising from pixel space into a compressed autoencoder latent space, making high-resolution image generation far cheaper while preserving flexibility.

Text-to-Image · OpenAI

DALL·E 2: Text-to-Image Generation Through CLIP Latents

DALL·E 2 splits text-to-image generation into a prior that predicts a CLIP image embedding and a decoder that turns that embedding into an image.

Text-to-Image · Google Research

Imagen: Why Text Understanding Matters for Image Generation

Imagen showed that stronger language encoders can materially improve text-to-image diffusion models, especially for prompt alignment and photorealism.

Recent papers

Text-to-Image · Google Research

Imagen: Why Text Understanding Matters for Image Generation

Imagen showed that stronger language encoders can materially improve text-to-image diffusion models, especially for prompt alignment and photorealism.

Text-to-Image · OpenAI

DALL·E 2: Text-to-Image Generation Through CLIP Latents

DALL·E 2 splits text-to-image generation into a prior that predicts a CLIP image embedding and a decoder that turns that embedding into an image.

Diffusion Models · CompVis

Latent Diffusion: The Paper Behind Practical High-Resolution Image Generation

Latent diffusion moves denoising from pixel space into a compressed autoencoder latent space, making high-resolution image generation far cheaper while preserving flexibility.

Text-to-Image · Google Research

Imagen: Why Text Understanding Matters for Image Generation

Imagen showed that stronger language encoders can materially improve text-to-image diffusion models, especially for prompt alignment and photorealism.

Text-to-Image · OpenAI

DALL·E 2: Text-to-Image Generation Through CLIP Latents

DALL·E 2 splits text-to-image generation into a prior that predicts a CLIP image embedding and a decoder that turns that embedding into an image.

Diffusion Models · CompVis

Latent Diffusion: The Paper Behind Practical High-Resolution Image Generation

Latent diffusion moves denoising from pixel space into a compressed autoencoder latent space, making high-resolution image generation far cheaper while preserving flexibility.