Institution

Shanghai Jiao Tong University

A leading Chinese research university; its EPIC Lab works on efficient AI and LLM inference acceleration.

Claw-SWE-Bench: Why Coding Agent Harnesses Matter

Claw-SWE-Bench evaluates OpenClaw-style coding-agent harnesses on 350 GitHub issue tasks. OpenClaw jumps from 19.1% to 73.4% Pass@1 with a full adapter.

AI Agents · Shanghai Jiao Tong University

LatentSkill: Bake Agent Skills Into LoRA Weights, Not the Prompt

A hypernetwork compiles a textual skill into a LoRA adapter in one forward pass. On ALFWorld, LatentSkill lifts success by 21.4 points (seen) with 64.1% fewer prefill tokens.

AI Agents · Shanghai Jiao Tong University

SWE-Explore: Can Coding Agents Find the Right Code?

SWE-Explore isolates the repo-exploration stage of coding agents over 848 issues. Agentic explorers crush BM25 (HitFile 0.65 vs 0.08), but line-level recall stalls at 0.15-0.20, and that gap is what limits repairs.

Language Models · Xiaohongshu

NITP: Predict the Next Token's Meaning, Not Just Its ID

NITP adds a dense target to next-token prediction: forecast a shallow-layer embedding of the next token. On a 9B MoE it lifts MMLU-Pro by 5.71 points for about 2 percent extra training FLOPs and zero inference cost.

AI Agents · Shanghai Jiao Tong University

ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration

ARIS is an open-source autonomous-research harness pairing a Claude-family executor with a GPT-family reviewer to attack the failure it calls 'plausible unsupported success', with 65+ skills and a three-stage audit.

Efficient AI · Shanghai Jiao Tong University

Domino: Splitting the Draft and the Causal Fix in Speculative Decoding

Domino lets a parallel drafter propose a whole block at once, then a lightweight head adds back the token-to-token dependencies — reaching up to 5.49x speedup on Transformers and 5.8x throughput on SGLang.

AI Agents · Shanghai Jiao Tong University

MMSkills: Multimodal Skill Packages for General Visual Agents

MMSkills packages textual procedures, runtime state cards, and keyframes into reusable skills for visual agents, lifting Qwen3-VL-235B from 21.34% to 39.17% on OSWorld and a small 8B model from 10.78% to 25.40%.