Institution

EleutherAI

Grassroots open-research collective focused on large language models and interpretability; origin of the Pile dataset, GPT-Neo/GPT-J, and early mechanistic-interpretability work.

Interpretability · EleutherAI

Sparse Autoencoders Find Interpretable Features in LLMs

Training a sparse autoencoder on a language model's activations pulls apart 'superposition' into single-meaning features more interpretable than neurons — and lets you edit one concept and watch behavior change.