arXiv:2606.03648v1 Announce Type: cross Abstract: Adapting foundation large language models to a user's task or preferred style through fine-tuning can result in compromising the model's safety. Previous works examined the effects of fine-tuning on model safety in limited and seemingly random experimental settings. We argue that anchoring...
Read the full article at the source.
Comments (0)
No comments yet. Be the first to comment!