Defending MLLMs from Implicit Jailbreak Attacks
A new class of attacks where text and image look safe separately, but their combination carries malicious meaning
A new class of attacks where text and image look safe separately, but their combination carries malicious meaning
The paper studies a new class of attacks against RAG-type systems
The paper discusses vulnerabilities in fine-tuning systems for large language models under conditions close to real-world operation