arXiv:2606.05970v1 Announce Type: cross Abstract: Large language models are increasingly used for structured extraction from clinical free-text notes, but the sensitivity of their output to upstream configuration choices is less understood than their accuracy on fixed benchmarks. This work measures that sensitivity without human-annotated ground...
Read the full article at the source.
Comments (0)
No comments yet. Be the first to comment!