Transformers · Google Research
Attention Is All You Need: The Architecture Under Modern AI
The Transformer removed recurrence and convolution from sequence transduction, replacing them with attention and parallel training; almost every modern LLM stands on that move.