Vision-Language-Action · Allen Institute for AI
MolmoAct2: An Open Action Reasoning Stack for Real Robots
MolmoAct2 is an open vision-language-action stack that reasons in 3D before acting. On real-world DROID it hits 87.1% success, +38.7 points over the runner-up, and its Molmo2-ER brain beats GPT-5 and Gemini Robotics ER.