Topics

Diffusion Language Models

Text generation by iterative denoising instead of left-to-right decoding — parallel, non-autoregressive language models.

Diffusion Language Models · Independent Researcher

Diffusion Language Modeling: Promises and Challenges

Diffusion language modeling survey turns the state of diffusion language modeling into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.

Diffusion Language Models · Independent Researcher

Factorization-Error-Free Decoding for Diffusion LMs

Factorization-error-free decoding turns speculative decoding for discrete diffusion LMs into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.

Diffusion Language Models · Independent Researcher

SEDD: Discrete Diffusion Language Modeling by Ratios

SEDD turns discrete diffusion language modeling into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.

Diffusion Language Models · Stanford University

Diffusion-LM: Controllable Text from Denoising

Diffusion-LM uses continuous denoising over word vectors so gradient guidance can control syntax and other fine-grained attributes without retraining the LM.

Mixture of Experts · National University of Singapore

dMoE: Block-Level Expert Routing for Diffusion LLMs

dMoE aligns token-level MoE routing with block-parallel decoding in diffusion LLMs. On LLaDA2.0-mini it cuts unique experts per block from 69.5 to 14.6, keeps 99.11% accuracy, and frees 76-80% of expert memory.

Diffusion Language Models · Renmin University of China

LLaDA: An 8B Diffusion Language Model That Rivals LLaMA3

LLaDA trains an 8B language model by masked diffusion instead of next-token prediction, matches LLaMA3 8B in in-context learning, hits 70.7 on GSM8K, and beats GPT-4o on the reversal-curse poem task.