AI Rater Discrimination Depends on Scoring Protocol in Complex Clinical Decision-Making

Sangwon Baek, Kyu Yeon Hur, Kyunga Kim

Jun 3, 2026 at 04:00

6 Views

0 Comments

arXiv:2606.03198v1 Announce Type: cross Abstract: Clinical AI evaluation increasingly delegates scoring to large language models (LLMs) acting as AI raters, yet their scoring behavior across evaluation conditions has not been quantitatively characterized. We address this gap through a factorial study of AI rater behavior in adult type 2 diabetes...

Read the full article at the source.

Read Original Article

Was this helpful?