Alignment · Stanford University
DPO: The Alignment Trick That Removed the RL Loop
Direct Preference Optimization turns preference tuning into a simple classification-style objective, avoiding an explicit reward model and reinforcement learning loop.
Institution
A leading research university with major contributions in AI, systems, language, and robotics.
Alignment · Stanford University
Direct Preference Optimization turns preference tuning into a simple classification-style objective, avoiding an explicit reward model and reinforcement learning loop.
Efficient AI · Stanford University
FlashAttention keeps attention exact but makes it IO-aware, using tiling to reduce slow GPU memory traffic and make long-sequence Transformers faster and cheaper.