arXiv:2606.11627v1 Announce Type: cross Abstract: Recent work has shown that on-policy distillation can internalize privileged context, such as system prompts or task hints, into a student model so that the context is no longer needed at inference time. Although this approach successfully improves the student's no-context performance, we identify...
Læs hele artiklen hos kilden.
Kommentarer (0)
Ingen kommentarer ennå. Bli den første til å kommentere!