arXiv:2606.02132v2 Announce Type: replace Abstract: Agentic reinforcement learning can induce tool abuse, where models overuse external tools even for queries solvable by internal reasoning. Existing approaches mitigate this issue with uniform tool-use penalties or hard limits, which reduce tool frequency but may also suppress useful...
Les hele artikkelen hos kilden.
Kommentarer (0)
Ingen kommentarer ennå. Bli den første til å kommentere!