Jailbreak

Defending MLLMs from Implicit Jailbreak Attacks

A new class of attacks where text and image look safe separately, but their combination carries malicious meaning

The paper studies a new class of attacks against RAG-type systems

The paper discusses vulnerabilities in fine-tuning systems for large language models under conditions close to real-world operation