Allow known destinations, block suspicious domains, and keep internal agents on approved routes.
See and control what your agents actually do.
TrapDefense runs in your environment to observe MCP and tool calls, flag risky egress, redact sensitive data, and leave audit evidence before agent actions become incidents. It is not a SaaS API for sending your prompts away. Inspired by DeepMind's AI Agent Traps research.
# run TrapDefense inside your environment asr proxy --config config.yaml # route actions through the local gate MCP_URL="/mcp" EGRESS_URL="/proxy/http" # observe first, then warn, then enforce
Record structured events for policy decisions, stream findings, redaction, and rollout review.
Catch emails, API keys, Korean PII, and other sensitive values before downstream exposure.
Keep the data plane in your stack, then add framework hooks and enterprise rollout support when needed.
The model is not the only risk. Outbound action is.
Prompt filtering helps, but real damage happens when an agent calls an external API, hits an MCP server, posts to a webhook, or leaks sensitive output. TrapDefense gives teams a self-hosted control point for that execution path, with policy decisions and audit evidence.
Damage happens when the agent actually reaches a destination.
Hidden instructions matter, but the business risk appears when an agent sends data outward, invokes a high-risk tool, or crosses a boundary your team did not intend.
Built for the trap layers where runtime controls are strongest today.
Google DeepMind's AI Agent Traps paper maps multiple attack surfaces across perception, reasoning, memory, action, multi-agent dynamics, and human oversight. TrapDefense does not pretend to solve the whole map. It focuses on perception and action, where practical controls can be deployed now.
Practical controls for real agent actions.
TrapDefense stays focused on the controls teams need first: action visibility, outbound policy enforcement, sensitive-data protection, rollout safety, and auditability across agent execution paths.
Gate risky outbound requests
Allow approved domains, block suspicious egress, and keep agent traffic on known routes before execution completes.
Redact sensitive data
Detect and redact PII in requests and outputs so agents do not leak secrets by accident.
Protect MCP and internal tool workflows
Cover MCP traffic, framework tool calls, and generic internal or external HTTP egress paths.
Ship policies safely
Roll out new rules with shadow, warn, and enforce modes rather than turning on hard blocks from day one.
Scan for lightweight injection signals
Catch hidden text, HTML comment attacks, base64-like blobs, prompt-injection phrases, and exfiltration-shaped language.
Keep audit trails
Write structured events for findings, decisions, streaming scans, redaction, and rollout review.
A self-hosted defense layer built for rollout.
TrapDefense fits the way real teams ship: deploy behind your own domain, observe first, tighten gradually, and keep the evidence needed to tune policies without breaking production.
Inspect inbound and outbound context
Normalize requests, inspect content for lightweight risk signals, and build a decision context before forwarding.
Evaluate policy before forwarding
Apply allowlists, blocklists, egress controls, PII checks, and mode-aware decisions before the request leaves your stack.
Protect outputs and record evidence
Redact sensitive output, inspect streams, and leave a trace your team can review later.
Built for teams operating agents in the real world.
TrapDefense is strongest where outbound action matters more than text: MCP servers, internal assistants, agent gateways, and tool-calling systems connected to sensitive company workflows.
Protect MCP servers
Put policy enforcement and audit logging in front of MCP traffic without sending data through a third-party gateway.
Audit tool calls in production
Capture which agent called which tool, with what arguments, and which policy decision applied before execution.
Reduce exfiltration risk
Block suspicious destinations, review generic HTTP egress, and redact sensitive data before it leaves the system.
Roll out security safely
Start with observation, move into warning, then enforce with confidence after policy tuning.
Deploy the runtime control point in your environment.
TrapDefense is designed for self-hosted deployment first. Keep agent action logs, tool arguments, and policy decisions inside your own boundary, then add enterprise workflow support only when your team needs it.
Local runtime checkpoint
Run TrapDefense behind your own domain and route MCP calls, framework tool calls, or HTTP egress through a policy gate.
One policy path, multiple modes
Use the same decision engine for shadow, warn, and enforce so rollout behavior matches production behavior.
Enterprise layer when needed
Add policy management, rollout support, audit workflows, and onboarding help for production teams.
Runtime path
agent -> TrapDefense checkpoint -> MCP server / HTTP tool / webhook # observe, audit, warn, then enforce
Policy config
{
"mode": "enforce",
"block_egress": true,
"domain_allowlist": ["api.internal.com", "hooks.slack.com"],
"pii_action": "warn",
"redact_response_pii": true
}
Open source first. Enterprise when your team needs an operating layer.
TrapDefense starts as an open-source, self-hosted runtime control layer. Enterprise expands the operating layer with shared policies, audit workflows, rollout support, and onboarding for serious teams.
Deploy the runtime guard now.
Use the data plane and core policy engine for egress control, MCP integration, content scanning, PII redaction, and audit logs.
Operate policies across a team.
TrapDefense Enterprise is the next layer for teams that need centralized governance, onboarding help, and security operations around agent runtime controls.
Start an enterprise conversation.
Tell us about your stack, your outbound risk, and what you want to protect. We'll route the inquiry to hellocosmos@gmail.com.
What teams usually ask first.
The goal is not to pretend every security problem is solved. The goal is to be crisp about where TrapDefense is strongest today.
What does TrapDefense protect against?
TrapDefense is strongest at action-layer risk: unsafe destinations, risky egress, sensitive output exposure, and policy enforcement around MCP, LLM, and internal HTTP traffic. It also adds lightweight detection for content-injection signals. It does not claim to solve every agent-security problem.
Is this a full agent security platform?
No. TrapDefense is a self-hosted runtime control layer and policy engine, not a complete security platform. It focuses on practical runtime controls teams can deploy now.
Why not just use prompt filtering?
Prompt filtering can catch some suspicious input, but it does not control the final tool call or outbound request. TrapDefense adds policy enforcement at the point where an agent action is about to cross a boundary.
Do I need MCP to use TrapDefense?
No. MCP is a strong use case, but TrapDefense also works for framework tool calls and internal or external HTTP egress workflows.