Institution

Skywork AI

A Chinese AI lab (Kunlun) building multimodal foundation models across language, audio, and video.

Audio Interaction Model: A Streaming Audio LLM That Decides When to Speak

The Audio Interaction Model runs a perceive-decide-respond loop so an audio LLM listens, decides if and when to reply, and answers on the fly — trained on StreamAudio-2M and competitive across 8 benchmarks.