Institution

Alibaba Qwen Team

Alibaba's Qwen team, releasing the open Qwen family of language and multimodal models.

ABot-Earth 0.5: Generating 3D Cities From Satellite Images

ABot-Earth 0.5 uses satellite imagery to generate 3D Gaussian Splatting city scenes, reporting under 10 minutes per square kilometer and FID 16.1.

AI Agents · HKUST

StreamMA: Streaming Beats Waiting in Multi-Agent Reasoning

StreamMA pipes each reasoning step to the next agent the moment it is written, not after the full chain. Across 8 benchmarks it gains +7.3 pp on average (max +22.4 pp on HMMT 2026) and runs up to 26.9x faster.

Text-to-Image · Alibaba Qwen Team

Qwen-Image-Flash: Beyond Objective Design in Few-Step Distillation

Qwen-Image-Flash distills Qwen-Image-2.0 to 4 sampling steps for both text-to-image and editing. The Alibaba Qwen team shows the training recipe — data, teachers, task mix — matters as much as the distillation objective.

LLM Reasoning · Alibaba Qwen Team

DVAO: Variance-Adaptive Advantage Weighting for Multi-Reward RL

DVAO weights each reward by its in-group variance instead of fixed coefficients, lifting Qwen3-4B-Base from 38.99% to 42.19% average accuracy and length compliance to 99.91% in math-plus-tool-use RL.

Diffusion Models · Alibaba Qwen Team

MIGA: Train-Free Infinite-Frame Generation for Consistent Long Videos

MIGA turns a fixed-length video diffusion model into a 1000+-frame generator with no training and constant memory, hitting 97.82 overall on VBench with VideoCrafter2 — about 2.8 points over FIFO-Diffusion.

Efficient AI · Alibaba Qwen Team

Full Attention Strikes Back: RTPurbo Sparsifies LLMs in Hundreds of Steps

RTPurbo converts a trained full-attention LLM into a sparse one with about 600+600 adaptation steps, keeping LongBench accuracy (54.24 vs 53.80) while hitting 9.36x prefill speedup at 1M context.

Text-to-Image · Alibaba Qwen Team

Qwen-Image-2.0: One Model for High-Fidelity Generation and Editing

Qwen-Image-2.0 from Alibaba unifies text-to-image generation and editing in one diffusion transformer, renders up to 1K-token instructions for slides and posters, and adds native 2K photorealism via a 16x VAE.

Vision-Language-Action · Alibaba Qwen Team

Qwen-VLA: One Model for Manipulation, Navigation, and Trajectories

Qwen-VLA extends Qwen's vision-language stack with a DiT action decoder and embodiment-aware prompts to run manipulation, navigation, and trajectory prediction in one model — 97.9% on LIBERO and 69.0% OSR on R2R.

Open Models · Alibaba Qwen Team

Qwen2.5 Explained: Alibaba's Open LLM Family, 0.5B to 72B

Qwen2.5 is Alibaba's open-weight LLM family spanning 0.5B–72B, pretrained on 18T tokens; the 72B-Instruct flagship rivals Llama-3-405B-Instruct, a model roughly 5x larger.

Diffusion Models · Alibaba Qwen Team

Rethinking Cross-Layer Information Routing in Diffusion Transformers

DAR replaces the residual add in diffusion transformers with timestep-adaptive aggregation of past sublayer outputs, cutting SiT-XL/2's ImageNet FID from 9.67 to 7.56 with 8.75x fewer iterations.

Language Models · Alibaba Qwen Team

TransitLM: A Map-Free Transit Routing Dataset and Benchmark

TransitLM is a 13M-record corpus from four Chinese cities (120,845 stations) that trains a language model to plan transit routes with no map engine — a 4B model hits 97.0% connectivity and 71.0% exact match.