arXiv:2605.08747v4 Announce Type: replace Abstract: Standard embodied evaluations do not independently score whether an agent correctly commits to task completion at episode closure, a capacity we call terminal commitment. Behaviorally distinct failures--never completing the task, completing it but failing to stop, and reporting success without...
Read the full article at the source.
Comments (0)
No comments yet. Be the first to comment!