arXiv:2606.05609v1 Announce Type: cross Abstract: As large language models (LLMs) are widely deployed, identifying their vulnerability through jailbreak attacks becomes increasingly critical. Optimization-based attacks like Greedy Coordinate Gradient (GCG) have focused on inserting adversarial tokens to the end of prompts. However, GCG restricts...
Läs hela artikeln hos källan.
Kommentarer (0)
Inga kommentarer ännu. Bli först med att kommentera!