Kryptovalutaticker:
technology från Arxiv cs.ai

Generalization Hacking: Models Can Game Reinforcement Learning by Preventing Behavioral Generalization

Frank Xiao, Mary Phuong
Thursday at 04:00
5 Visningar
0 Kommentarer

arXiv:2606.12016v1 Announce Type: cross Abstract: Model post-training, and in particular reinforcement learning (RL), is one of the primary mechanisms by which developers can shape models' values and behaviors. However, as models become increasingly evaluation and training aware, they may be motivated to resist training when the perceived...

Läs hela artikeln hos källan.

Var detta hjälpsamt?
Dela:

Kommentarer (0)

Vänligen logga in för att publicera en kommentar

Inga kommentarer ännu. Bli först med att kommentera!