Notes

OpenRT - An Open Framework for Red Teaming Multimodal LLMs

OpenRT is a modular and extensible environment for systematic safety evaluation of large language models

Note on the document Small Language Model for AI Agents HandBook

The authors present a new attack called Doublespeak: a simple attack based on hijacking the model’s internal representations in context

A new framework for creating compact models for finding vulnerabilities in C/C++ code

A new attack that makes it possible to determine the topic of an LLM query from encrypted traffic

How LLM selection affects agent security

How future devices with built-in LLMs will become a security problem, because attackers will be able to live off the LLM (LOLLM)

A guide to designing secure enterprise AI agents using MCP from IBM, with verification from Anthropic

A new class of attacks where text and image look safe separately, but their combination carries malicious meaning

Model pruning can be used by an attacker