Topics
Diffusion Language Models
Text generation by iterative denoising instead of left-to-right decoding — parallel, non-autoregressive language models.
Diffusion Language Models · Independent Researcher
Diffusion language modeling survey turns the state of diffusion language modeling into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
Diffusion Language Models · Independent Researcher
Factorization-error-free decoding turns speculative decoding for discrete diffusion LMs into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
Diffusion Language Models · Independent Researcher
SEDD turns discrete diffusion language modeling into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
Diffusion Language Models · Stanford University
Diffusion-LM uses continuous denoising over word vectors so gradient guidance can control syntax and other fine-grained attributes without retraining the LM.
Mixture of Experts · National University of Singapore
dMoE aligns token-level MoE routing with block-parallel decoding in diffusion LLMs. On LLaDA2.0-mini it cuts unique experts per block from 69.5 to 14.6, keeps 99.11% accuracy, and frees 76-80% of expert memory.
Diffusion Language Models · Renmin University of China
LLaDA trains an 8B language model by masked diffusion instead of next-token prediction, matches LLaMA3 8B in in-context learning, hits 70.7 on GSM8K, and beats GPT-4o on the reversal-curse poem task.