World Models · Alibaba Qwen Team
ABot-Earth 0.5: Generating 3D Cities From Satellite Images
ABot-Earth 0.5 uses satellite imagery to generate 3D Gaussian Splatting city scenes, reporting under 10 minutes per square kilometer and FID 16.1.
Institution
Alibaba's Qwen team, releasing the open Qwen family of language and multimodal models.
World Models · Alibaba Qwen Team
ABot-Earth 0.5 uses satellite imagery to generate 3D Gaussian Splatting city scenes, reporting under 10 minutes per square kilometer and FID 16.1.
StreamMA pipes each reasoning step to the next agent the moment it is written, not after the full chain. Across 8 benchmarks it gains +7.3 pp on average (max +22.4 pp on HMMT 2026) and runs up to 26.9x faster.
Text-to-Image · Alibaba Qwen Team
Qwen-Image-Flash distills Qwen-Image-2.0 to 4 sampling steps for both text-to-image and editing. The Alibaba Qwen team shows the training recipe — data, teachers, task mix — matters as much as the distillation objective.
LLM Reasoning · Alibaba Qwen Team
DVAO weights each reward by its in-group variance instead of fixed coefficients, lifting Qwen3-4B-Base from 38.99% to 42.19% average accuracy and length compliance to 99.91% in math-plus-tool-use RL.
Diffusion Models · Alibaba Qwen Team
MIGA turns a fixed-length video diffusion model into a 1000+-frame generator with no training and constant memory, hitting 97.82 overall on VBench with VideoCrafter2 — about 2.8 points over FIFO-Diffusion.
Efficient AI · Alibaba Qwen Team
RTPurbo converts a trained full-attention LLM into a sparse one with about 600+600 adaptation steps, keeping LongBench accuracy (54.24 vs 53.80) while hitting 9.36x prefill speedup at 1M context.
Text-to-Image · Alibaba Qwen Team
Qwen-Image-2.0 from Alibaba unifies text-to-image generation and editing in one diffusion transformer, renders up to 1K-token instructions for slides and posters, and adds native 2K photorealism via a 16x VAE.
Vision-Language-Action · Alibaba Qwen Team
Qwen-VLA extends Qwen's vision-language stack with a DiT action decoder and embodiment-aware prompts to run manipulation, navigation, and trajectory prediction in one model — 97.9% on LIBERO and 69.0% OSR on R2R.
Open Models · Alibaba Qwen Team
Qwen2.5 is Alibaba's open-weight LLM family spanning 0.5B–72B, pretrained on 18T tokens; the 72B-Instruct flagship rivals Llama-3-405B-Instruct, a model roughly 5x larger.
Diffusion Models · Alibaba Qwen Team
DAR replaces the residual add in diffusion transformers with timestep-adaptive aggregation of past sublayer outputs, cutting SiT-XL/2's ImageNet FID from 9.67 to 7.56 with 8.75x fewer iterations.
Language Models · Alibaba Qwen Team
TransitLM is a 13M-record corpus from four Chinese cities (120,845 stations) that trains a language model to plan transit routes with no map engine — a 4B model hits 97.0% connectivity and 71.0% exact match.