Topics

LLM Reasoning

Eliciting and improving step-by-step reasoning in large language models.

AlphaGeometry: Olympiad Geometry Without Human Proof Demonstrations

AlphaGeometry combines a neural language model with symbolic deduction, using synthetic theorems and proofs to reach near gold-medal performance on olympiad geometry.

Alignment · Stanford University

DPO: The Alignment Trick That Removed the RL Loop

Direct Preference Optimization turns preference tuning into a simple classification-style objective, avoiding an explicit reward model and reinforcement learning loop.

Multimodal Models · OpenAI

GPT-4: The Report That Made Frontier Models Feel Measurable

GPT-4 was less a full recipe than a measurement document: a multimodal Transformer whose benchmark performance, scaling predictability, and post-training alignment reset expectations for frontier AI.

Open Models · Meta AI

Llama 3: Meta Turns Open Weights Into a Full Model System

Llama 3 is not just a bigger open-weight model; it is Meta's attempt to package multilingual, coding, reasoning, tool use, and safety into a coherent public model family.

LLM Reasoning · DeepSeek

DeepSeek-R1: Teaching a Model to Reason With Almost No Human Labels

Reinforcement learning alone, with no supervised reasoning traces, can make a base language model develop strong step-by-step reasoning, rivaling top closed models.