Mixture of Experts · Renmin University of China
Manifold Power Iteration: A Better Router for MoE Models
MPI redesigns MoE routers by aligning router rows with expert weight directions. On 11B MoE, average benchmark accuracy rises from 40.92 to 42.76 with only 0.2% training slowdown.