Multimodal Models · The University of Tokyo
Perception or Prejudice: Can MLLMs Ground Personality in Real Evidence?
MM-OCEAN tests whether multimodal LLMs justify Big Five personality ratings with real video evidence. Across 27 models, 51.3% of correct ratings rest on wrong cues, and the best grounds only 33.5% fully.