OpenRT - An Open Framework for Red Teaming Multimodal LLMs

OpenRT is a modular and extensible environment for systematic safety evaluation of large language models

23 December 2025

Small Language Models

Note on the document Small Language Model for AI Agents HandBook

15 December 2025

Doublespeak

The authors present a new attack called Doublespeak: a simple attack based on hijacking the model’s internal representations in context

10 December 2025

FineSec

A new framework for creating compact models for finding vulnerabilities in C/C++ code

6 December 2025

Whisper Leak

A new attack that makes it possible to determine the topic of an LLM query from encrypted traffic

4 December 2025

Breaking Agent Backbones

How LLM selection affects agent security

2 December 2025

LOTL Attacks Using Local LLMs

How future devices with built-in LLMs will become a security problem, because attackers will be able to live off the LLM (LOLLM)

30 November 2025

Architecting secure enterprise AI agents with MCP

A guide to designing secure enterprise AI agents using MCP from IBM, with verification from Anthropic

25 November 2025

Defending MLLMs from Implicit Jailbreak Attacks

A new class of attacks where text and image look safe separately, but their combination carries malicious meaning

22 November 2025

Pruning-Activated Attack

Model pruning can be used by an attacker

17 November 2025