Block external posts, restrict file paths, and gate sensitive capabilities.
Runtime defense for AI agent traps.
Inspired by Google DeepMind's AI Agent Traps research, TrapDefense provides practical runtime defense for hidden content injection, unsafe tool execution, data exfiltration, and sensitive-data exposure.
from asr import Guard from asr.mcp import mcp_guard guard = Guard( mode="shadow", domain_allowlist=["api.internal.com"], block_egress=True, pii_action="block", capability_policy={"shell_exec": "block"}, ) # protect MCP tool handlers before damage happens @server.tool() @mcp_guard(guard, capabilities=["network_send"]) async def send_email(to, subject, body): ...
Record structured JSONL events for scans, tool decisions, and redaction.
Protect sensitive data even when you are still rolling out policies in shadow mode.
Use the open-source SDK now, then add enterprise workflows when your team needs them.
The model is not the only risk.
Prompt filters can catch suspicious text, but they do not stop the final action. TrapDefense focuses on the last line of defense: runtime control at tool execution time, plus audit evidence for what happened and why.
Damage happens when the agent actually uses a tool.
Hidden instructions in content are dangerous, but the real business risk appears when an agent sends email, calls an external API, reads the wrong file, or leaks sensitive data through a connected workflow.
Built for the traps that cause real-world damage.
Google DeepMind's AI Agent Traps paper maps six attack surfaces across perception, reasoning, memory, action, multi-agent dynamics, and human oversight. TrapDefense focuses on the layers where practical runtime controls help most today: perception and action.
Practical controls for real agent workflows.
TrapDefense stays focused on the controls teams actually need first: policy enforcement, sensitive data protection, gradual rollout, and auditability across AI agent execution paths.
Guard risky tool use
Control outbound requests, file access, sensitive capabilities, and unknown tools before they execute.
Redact sensitive data
Detect and redact PII in tool arguments and outputs so agents do not leak secrets by accident.
Protect MCP workflows
Wrap MCP tool handlers with policy checks and audit logging using a lightweight adapter.
Ship policies safely
Roll out new rules with shadow, warn, and enforce modes rather than turning on hard blocks from day one.
Scan for basic injection signals
Catch hidden text, metadata payloads, HTML comment attacks, base64 instructions, and prompt-injection keywords.
Keep audit trails
Write structured JSONL events for findings, tool decisions, result redaction, and error events.
A runtime defense layer built for rollout.
TrapDefense is designed to fit the way real teams ship: observe first, tighten gradually, and keep the evidence needed to tune policies without breaking production.
Scan inbound content
Detect basic content-injection patterns before suspicious material gets normalized inside agent workflows.
Evaluate tool calls
Apply policy checks in sequence: blocklists, egress, file paths, PII, capability fallback, and default actions.
Protect outputs and record evidence
Redact sensitive data, log decisions, and leave a trace your team can review later.
Built for teams operating agents in the real world.
TrapDefense is especially strong where actions matter more than text: MCP servers, internal assistants, and tool-calling systems connected to sensitive company workflows.
Protect MCP servers
Add policy enforcement and audit logging to MCP tool handlers without standing up a separate proxy first.
Control internal agents
Restrict where assistants can send data, which files they can touch, and what capabilities they can invoke.
Reduce exfiltration risk
Block suspicious destinations, catch risky outputs, and redact sensitive data before it leaves the system.
Roll out security safely
Start with observation, move into warning, then enforce with confidence after policy tuning.
TrapDefense API — check tool calls before they run.
No SDK install needed. Call our hosted API to scan for threats, get policy decisions, and redact sensitive data — from any language or framework.
Threat Detection
Detect hidden injection in CSS, HTML, markdown, shortened URLs, exfiltration phrases, and encoded bypass attempts. Covers perception-layer traps.
Policy Decision
Block unsafe tool calls, risky destinations, and unauthorized actions before execution. Covers action-layer traps — where real damage happens.
PII Redaction
Mask emails, SSN, IBAN, Korean PII, and API keys in tool outputs. Regional profiles for KR, US, EU. Reduces sensitive-data exposure.
Sample Request
POST /api/v1/decide
{
"tool_name": "send_email",
"args": { "to": "external@unknown.com" },
"capabilities": ["network_send"]
}
Response
{
"ok": true,
"data": {
"action": "block",
"reason": "domain not in allowlist",
"policy_id": "egress_control"
}
}
Open source first. Enterprise when your team needs more.
TrapDefense starts as an open-source SDK for runtime security in AI agent systems. Enterprise expands the operational layer: shared policies, audit workflows, and support for serious teams.
Ship the runtime layer now.
Use the SDK for policy enforcement, MCP integration, basic content scanning, PII redaction, and audit logging.
Operate policies across a team.
TrapDefense Enterprise is the next layer for teams that need centralized governance, onboarding help, and security operations around agent runtime controls.
Start an enterprise conversation.
Tell us a bit about your team and what you want to protect. We'll route the inquiry to hellocosmos@gmail.com.
What teams usually ask first.
The goal is not to pretend every security problem is solved. The goal is to be crisp about what TrapDefense already does well.
What does TrapDefense protect against?
TrapDefense is strongest at action-layer traps: data exfiltration, unsafe tool execution, and jailbreak sequences. It also provides strong detection for content-injection traps hidden in CSS, HTML, markdown, and encoded payloads. It does not claim to solve every trap family — especially long-term memory poisoning or multi-agent systemic failures.
Is this a full agent security platform?
No, and we are honest about that. TrapDefense is an open-source runtime security SDK focused on tool-execution control and audit evidence. It covers practical risks in the action and perception layers identified in Google DeepMind's AI Agent Traps research.
Why not just use prompt filtering?
Prompt filtering can catch some suspicious input, but it does not stop the final action. TrapDefense adds policy enforcement at the moment a tool is about to run.
Do I need MCP to use TrapDefense?
No. MCP is a strong starting point, but TrapDefense can also be used directly in Python agent workflows with decorators and policy evaluation hooks.