Language Models

Start here

BERT: The Bidirectional Pretraining Recipe That Rewired NLP

BERT made deep bidirectional Transformer pretraining practical, letting one pretrained encoder be fine-tuned into strong task-specific NLP systems with minimal architecture changes.

Language Models · Google DeepMind

Chinchilla: The Compute-Optimal Scaling Wake-Up Call

Chinchilla showed that many large language models were undertrained, and that better compute allocation can beat simply making parameters larger.

Language Models · OpenAI

GPT-3: The Moment Few-Shot Prompting Became the Interface

GPT-3 showed that a 175B autoregressive language model could perform many tasks from examples in the prompt, without gradient updates or task-specific fine-tuning.

Foundational papers

Language Models · Google Research

BERT: The Bidirectional Pretraining Recipe That Rewired NLP

BERT made deep bidirectional Transformer pretraining practical, letting one pretrained encoder be fine-tuned into strong task-specific NLP systems with minimal architecture changes.

Language Models · Google Research

T5: Turning Every NLP Task Into Text-to-Text

T5 unified NLP transfer learning by casting every task as text input to text output, then systematically studying objectives, data, scale, and fine-tuning choices.

Language Models · OpenAI

GPT-3: The Moment Few-Shot Prompting Became the Interface

GPT-3 showed that a 175B autoregressive language model could perform many tasks from examples in the prompt, without gradient updates or task-specific fine-tuning.

Alignment · OpenAI

InstructGPT: Why Bigger Models Still Needed Human Feedback

InstructGPT showed that human preference data and RLHF could make smaller models more helpful and aligned than much larger raw language models.

Recent papers

Code Generation · Google DeepMind

Start here

BERT: The Bidirectional Pretraining Recipe That Rewired NLP

Chinchilla: The Compute-Optimal Scaling Wake-Up Call

GPT-3: The Moment Few-Shot Prompting Became the Interface

Foundational papers

BERT: The Bidirectional Pretraining Recipe That Rewired NLP

T5: Turning Every NLP Task Into Text-to-Text

GPT-3: The Moment Few-Shot Prompting Became the Interface

InstructGPT: Why Bigger Models Still Needed Human Feedback

Recent papers

AlphaCode: Competitive Programming as a Code Generation Test

BERT: The Bidirectional Pretraining Recipe That Rewired NLP

Chinchilla: The Compute-Optimal Scaling Wake-Up Call

Code Llama: Open Code Models Built from Llama

GPT-3: The Moment Few-Shot Prompting Became the Interface

InstructGPT: Why Bigger Models Still Needed Human Feedback

AlphaCode: Competitive Programming as a Code Generation Test

BERT: The Bidirectional Pretraining Recipe That Rewired NLP

Chinchilla: The Compute-Optimal Scaling Wake-Up Call

Code Llama: Open Code Models Built from Llama

GPT-3: The Moment Few-Shot Prompting Became the Interface

InstructGPT: Why Bigger Models Still Needed Human Feedback

PaLM: Scaling a Dense Language Model to 540 Billion Parameters

T5: Turning Every NLP Task Into Text-to-Text

Start here

Foundational papers

Recent papers

Related topics