MatterGen Explained: Diffusion for Inverse Materials Design

Quick answer

MatterGen is a diffusion model that designs new inorganic crystals by working backward from a desired property instead of screening a fixed list of known candidates. It jointly denoises three things at once — atom types, atomic coordinates, and the periodic lattice — so it generates a complete, periodic crystal structure rather than a molecule. Microsoft trained it on 608,000 stable structures from the Materials Project and Alexandria, and the headline test is concrete: asked for materials with a bulk modulus above 400 GPa, it produced 106 stable, unique, novel structures within a budget of 180 DFT checks, versus only 2 such entries in the reference dataset. Published in Nature in January 2025, code released under MIT.

Screening vs. generation: the actual shift

The standard way to find a material with property X is virtual screening: take a database of known compounds, compute property X for each, keep the winners. The ceiling is the database — you can only discover what someone already listed. MatterGen inverts this. You specify the property target up front and the model samples crystals conditioned on it, drawing from the full combinatorial space of the periodic table rather than a catalog.

That inversion is the contribution, and the 400 GPa result is the cleanest evidence for it. The reference dataset contained 2 ultra-stiff materials above 400 GPa; MatterGen surfaced more than 250 candidates, of which 106 passed the stable-unique-novel bar after DFT. Screening literally could not have returned what was not in the list.

How the diffusion process is built for crystals

A crystal is not a point cloud — it is an infinite periodic tiling, so an off-the-shelf molecular diffusion model breaks on it. MatterGen runs three coupled corruption-and-denoising processes tailored to crystal structure: a categorical diffusion over atom types, a coordinate diffusion that respects the wrap-around periodic boundary, and a lattice diffusion over the unit-cell shape. A learned score network reverses all three together, and the model is trained to relax samples toward a local energy minimum.

This geometry-awareness is what makes the tight-to-minimum and high-novelty numbers in the next section possible: a generator that ignores periodicity produces structures that DFT then has to drag a long way to any stable configuration, wasting the compute budget. Treating the lattice as a first-class variable is the difference between proposals worth verifying and noise.

Key results

The numbers that matter cluster in three places. Stiffness on demand: conditioned at 400 GPa bulk modulus, MatterGen surfaced 250+ candidates and 106 stable-unique-novel structures within 180 DFT checks, against 2 in the reference dataset. Geometric fidelity: generated structures sit an average 0.021 Angstrom from their relaxed energy minimum, roughly an order of magnitude tighter than prior generative models. Genuine novelty: the base model’s stable-unique-novel (S.U.N.) rate is 38.57%, with conditional rates of 83% / 65% / 49% on well-explored, partially explored, and unexplored chemical systems. That last gradient is the honest headline — the model is most reliable where chemistry is already well-mapped.

Steering it with property adapters

You do not retrain MatterGen for each design goal. The base model is fine-tuned with lightweight adapter modules on a labeled property, then conditioned at sampling time. Microsoft demonstrated this for mechanical (bulk modulus), electronic (band gap), magnetic density, and chemistry/symmetry targets, and even a multi-property objective combining high magnetic density with low supply-chain-risk elements.

The practical implication of the conditional S.U.N. gradient above: MatterGen is most reliable exactly where you need it least, and its hit rate roughly halves as you push into genuinely new chemistry. The multi-property case is the more interesting demonstration — stacking two adapters at once is what an actual design brief (“strong magnet, no rare-earth bottleneck”) looks like, and it is far harder to fake by retrieval than a single-property target.

The one experiment that grounds it

A computational paper claiming new materials is cheap; MatterGen’s most important paragraph is the synthesis. Conditioned on a 200 GPa bulk modulus, it proposed TaCr2O6, which collaborators at the Shenzhen Institutes of Advanced Technology actually made. The synthesized crystal showed compositional disorder between Ta and Cr (the lab made a disordered variant of the ordered structure MatterGen drew), and the measured bulk modulus landed within about 20% of the 200 GPa target. One compound is not proof of a discovery engine, but it converts the claim from “plausible structures” to “a real material that hit its mechanical spec to first order.”

Limits and open questions

The most pointed critique came after release: an analysis in Materials Horizons argued MatterGen often re-predicts compounds already in its training data, questioning how much is genuinely new versus retrieved. The S.U.N. metric is designed to filter that, but the debate is real and unsettled. Beyond novelty, the loop still leans on DFT to verify every candidate, so “generate to order” is bounded by how many density-functional checks you can afford — 180 calculations bought 106 hits in the stiffness case, which is efficient but not free. Synthesizability is also not guaranteed: a structure can be DFT-stable and still resist being made, and the single TaCr2O6 success came back disordered rather than as the exact target. Treat MatterGen as a strong proposer that compresses the search space, not as an oracle that hands you finished materials.

FAQ

What does MatterGen actually generate?

MatterGen generates complete inorganic crystal structures — the atom types, their 3D coordinates, and the periodic unit-cell lattice — conditioned on a target property. Unlike molecular generators, its diffusion process is built around crystal periodicity and symmetry, so outputs are full periodic structures ready for DFT relaxation, not isolated molecules.

How is MatterGen different from materials screening?

Screening computes properties for compounds already in a database and keeps the best, so it can only return known entries. MatterGen inverts the search: you set the property target and it samples new crystals to match. Asked for bulk modulus above 400 GPa, it returned 106 stable-unique-novel structures versus 2 in the reference dataset.

What is MatterGen’s S.U.N. rate?

The base model’s stable-unique-novel rate is 38.57% — over a third of unconditional samples are stable, distinct, and absent from training, with an average 0.021 Angstrom displacement from the energy minimum. Conditional generation ranges from 83% on well-explored chemistries down to 49% on unexplored ones.

Has any MatterGen material been synthesized?

Yes — TaCr2O6, designed for a 200 GPa bulk modulus, was synthesized by collaborators at the Shenzhen Institutes of Advanced Technology. The lab made a compositionally disordered variant of the predicted structure, and the measured bulk modulus came within roughly 20% of the target.

Is MatterGen open source?

Yes. Microsoft released the MatterGen code under the MIT license on GitHub, along with training and fine-tuning data, alongside the Nature publication in January 2025.

MatterGen reframes materials discovery from “search the catalog” to “design to specification” — and the open question is no longer whether diffusion can draw a plausible crystal, but how much of what it draws is truly new and how much you can actually make. Read the original in Nature.