When Context Returns: Toward Robust Internalization in On-Policy Distillation

Xun Wang, Ruishuo Chen, Zhuoran Li, Yu Chen, Longbo Huang

Thursday at 04:00

5 Views

0 Comments

arXiv:2606.11627v1 Announce Type: cross Abstract: Recent work has shown that on-policy distillation can internalize privileged context, such as system prompts or task hints, into a student model so that the context is no longer needed at inference time. Although this approach successfully improves the student's no-context performance, we identify...

Read the full article at the source.

Read Original Article

Was this helpful?