Fine-Tuning & Adaptation · The Hong Kong Polytechnic University
Token Teachability: Distilling LLMs on Just 5% of Tokens
Teachability-Aware OPD supervises only ~5% of tokens, those where the teacher's correction lands inside the student's top-K support, matching or beating full-token distillation (44.89 vs 42.37 on Qwen3-4B to 1.7B).