arXiv:2606.05183v1 Announce Type: cross Abstract: Large language models are increasingly deployed as high-stakes advisors, yet standard alignment benchmarks treat sycophancy as a binary failure mode. We introduce the Granularity Gap: coarse binary metrics mask substantial social-compliance behaviors where models capitulate to user framing,...
Läs hela artikeln hos källan.
Kommentarer (0)
Inga kommentarer ännu. Bli först med att kommentera!