hit counter

AI Mental Status Exam Benchmark Shows Qwen3-Omni Reasoning Gap

MHD featured image for AI mental status exam benchmark and clinical reasoning gap.

A 2026 medRxiv benchmark found Qwen3-Omni reached only moderate agreement with expert mental status examination panels: AC1 = 0.70 vs. UTHealth and 0.72 vs. Yale, below expert agreement of 0.87; its aggregate performance masked a clinical reasoning gap, overcalling visible signs such as speech and affect while missing delusions and perceptual abnormalities.1 Research Highlights Expert …

Read more