arXiv:2606.11270v1 Announce Type: cross Abstract: Distillation of a language model intended to transfer benign behavior to a student model may also transfer undesirable characteristics, if they are present in the teacher model, a phenomenon known as subliminal learning. While qualitative evidence supports the existence of this effect, its...
Read the full article at the source.
Comments (0)
No comments yet. Be the first to comment!