Sequence Modeling · Carnegie Mellon University
Mamba: Selective State Spaces for Linear-Time Sequence Modeling
Mamba makes state space model parameters depend on the input, so it selectively remembers or forgets tokens. It scales linearly, runs 5x faster than Transformers, and Mamba-3B matches Transformers twice its size.