Institution
Microsoft Research
Microsoft's research division, contributing foundational work from ResNet to the Phi small-model family.
AI Agents · Renmin University of China
Arbor stores research attempts in a persistent hypothesis tree, then admits changes only through held-out evaluation. It reports best held-out results on six AO tasks and 86.36% Any Medal on MLE-Bench Lite.
Text Embeddings · Microsoft Research
E5 turns general-purpose text embeddings into a concrete research object, with evidence anchors, method tradeoffs, and limits for practical use.
AI for Science · Microsoft Research
MatterGen is a diffusion model that generates inorganic crystals matching a target property — and the one example it actually synthesized, TaCr2O6, came within 20% of its 200 GPa stiffness goal.
Speech Synthesis · Microsoft Research
NaturalSpeech 2 uses latent diffusion over neural-audio-codec vectors and scales to 44K hours of speech and singing, aiming for stronger zero-shot prosody than token LMs.
Speech Synthesis · Microsoft Research
VALL-E reframes TTS as codec-token language modeling: 60K hours of speech plus a 3-second prompt produce personalized zero-shot speech, but safety and release constraints matter.
World Models · Microsoft Research
Mirage stores a video world model's 3D memory inside diffusion latent space instead of an RGB point cloud, hitting state-of-the-art WorldScore (70.36) while running 10.57x faster and using 55x less GPU memory.
Text-to-Image · Microsoft Research
Microsoft's Lens is a 3.8B-parameter text-to-image diffusion model that matches 6B+ rivals while using about 19.3% of Z-Image's training compute, mostly by feeding it longer, denser captions.
Multimodal Models · Microsoft Research
LLaVA bolts a CLIP vision encoder onto a Vicuna LLM with one linear projection, then trains on GPT-4-generated image instructions — hitting 85.1% of GPT-4's score and 92.53% on ScienceQA.
Efficient AI · Microsoft Research
LoRA freezes a pretrained model and trains tiny low-rank matrices per layer instead — cutting trainable parameters up to 10,000x and GPU memory 3x versus full GPT-3 175B fine-tuning, with no extra latency.
Efficient AI · Microsoft Research
Phi-3-mini is a 3.8B-parameter model trained on 3.3T heavily filtered and synthetic tokens that hits 69% on MMLU and 8.38 on MT-bench — matching Mixtral 8x7B and GPT-3.5 while small enough to run on a phone.
Vision Foundation Models · Microsoft Research
ResNet adds skip connections so a layer learns a residual instead of a full mapping, making 152-layer networks trainable. An ensemble hit 3.57% top-5 error on ImageNet and won ILSVRC 2015.
AI Agents · Microsoft Research
SkillOpt trains a single skill document for a frozen LLM agent with bounded add/delete/replace edits and a held-out gate, lifting GPT-5.5 by +23.5 points in direct chat across six benchmarks.