Mellum 2: A 12B MoE Code Model Running at 2.5B Compute
Mellum 2 is JetBrains' open-weight 12B Mixture-of-Experts code model that activates only 2.5B parameters per token, matching dense 4B-14B baselines on software tasks at a fraction of the per-token compute.