Doublespeak

The authors present a new attack called Doublespeak: a simple attack based on hijacking the model’s internal representations in context

10 December 2025