World Models · Alibaba Qwen Team
ABot-Earth 0.5: Generating 3D Cities From Satellite Images
ABot-Earth 0.5 uses satellite imagery to generate 3D Gaussian Splatting city scenes, reporting under 10 minutes per square kilometer and FID 16.1.
Topics
Generative models that simulate consistent, controllable environments over time.
World Models · Alibaba Qwen Team
ABot-Earth 0.5 uses satellite imagery to generate 3D Gaussian Splatting city scenes, reporting under 10 minutes per square kilometer and FID 16.1.
World Models · Independent Researcher
AnchorWorld: Egocentric World Simulation for Embodied AI turns egocentric world simulation into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
World Models · JD.com (Joy Future Academy)
When a camera revisits an old spot, block-wise state-space recurrence scored 69.0 open-domain VLM consistency vs 12.25 for the no-memory baseline; aggressive compression and spatial summaries mostly collapsed.
World Models · Independent Researcher
Function2Scene: 3D Indoor Layout from Functional Specs turns functional 3D scene layout into a checkable test, with concrete failure signals, benchmark limits, and builder takeaways.
World Models · University of Macau
PF-OPSD teaches a Qwen3.5-9B MLLM to decide when to simulate the future with a video world model, verify the rollout, and fold it into its answer, lifting accuracy +10.6 and +10.9 points on two new QA benchmarks.
Echo-Infinity is an autoregressive video model with a learnable evolving memory that compresses any-length history at constant cost, hitting 24-hour rollouts (over 1.3M frames) in real time at 18.5 FPS on an H100.
Diffusion Models · Tsinghua University
Causal Forcing++ distills bidirectional video diffusion into a 1-2 step frame-wise autoregressive generator at 14.1 FPS, halves first-frame latency, and cuts few-step training cost ~4x (11,600 to 2,900 A800 GPU hours).
Diffusion Models · Alibaba Qwen Team
MIGA turns a fixed-length video diffusion model into a 1000+-frame generator with no training and constant memory, hitting 97.82 overall on VBench with VideoCrafter2 — about 2.8 points over FIFO-Diffusion.
Gamma-World is NVIDIA's video world model for multiplayer simulation that runs at 24 FPS and generalizes from two to four players with no retraining, cutting Solaris's FVD roughly in half.
World Models · Microsoft Research
Mirage stores a video world model's 3D memory inside diffusion latent space instead of an RGB point cloud, hitting state-of-the-art WorldScore (70.36) while running 10.57x faster and using 55x less GPU memory.
LongLive-2.0 runs a 5B long-video model end to end in NVFP4 4-bit, hitting 45.7 FPS at 720p, 2.1x faster training and 1.84x faster inference, while VBench total drops only ~0.5 points from BF16.
Diffusion Models · University of Science and Technology of China
Stream-R1 reweights DMD losses by video reward scores and per-region perplexity instead of treating signals equally. Its 1.3B streaming model hits 84.40 VBench at 23.1 FPS, beating its 14B teacher's 84.26 for free.
Diffusion Models · University of Science and Technology of China
Stream-T1 adds test-time search to streaming video generation without retraining, lifting VideoAlign motion quality from 0.350 to 0.629 at 5s and cutting the drift that wrecks 30-second clips.
World Models · Fudan University
WBench scores interactive video world models on five axes — quality, setting, interaction, consistency, physics — across 289 cases and 1,058 turns, and finds no single model wins on all five.