Doublespeak
The authors present a new attack called Doublespeak: a simple attack based on hijacking the model’s internal representations in context
The authors present a new attack called Doublespeak: a simple attack based on hijacking the model’s internal representations in context