Self-hosted runtime control for agent actions

See and control what your agents actually do.

TrapDefense runs in your environment to observe MCP and tool calls, flag risky egress, redact sensitive data, and leave audit evidence before agent actions become incidents. It is not a SaaS API for sending your prompts away. Inspired by DeepMind's AI Agent Traps research.

View on GitHub Docs Talk to Us

Self-hosted by default MCP + tool calls Shadow -> Warn -> Enforce Audit-ready

runtime action checkpoint shadow mode active

# run TrapDefense inside your environment
asr proxy --config config.yaml

# route actions through the local gate
MCP_URL="/mcp"
EGRESS_URL="/proxy/http"

# observe first, then warn, then enforce

egress

Control risky outbound traffic

Allow known destinations, block suspicious domains, and keep internal agents on approved routes.

audit

Keep evidence your security team can use

Record structured events for policy decisions, stream findings, redaction, and rollout review.

data protection

Redact sensitive outputs before they leave

Catch emails, API keys, Korean PII, and other sensitive values before downstream exposure.

deployment

Start as a local control point

Keep the data plane in your stack, then add framework hooks and enterprise rollout support when needed.

focus

Agent egress control

integration

MCP, tool calls, HTTP

rollout

Shadow, warn, enforce

evidence

Structured audit evidence

Why TrapDefense

The model is not the only risk. Outbound action is.

Prompt filtering helps, but real damage happens when an agent calls an external API, hits an MCP server, posts to a webhook, or leaks sensitive output. TrapDefense gives teams a self-hosted control point for that execution path, with policy decisions and audit evidence.

the real problem

Damage happens when the agent actually reaches a destination.

Hidden instructions matter, but the business risk appears when an agent sends data outward, invokes a high-risk tool, or crosses a boundary your team did not intend.

Filters do not enforce egress Scanning text helps, but it does not stop the final network call or tool invocation.

Security teams need evidence Blocking without logs creates blind spots. Logs without enforcement arrive after the data already moved.

Production rollouts need safety rails Start in shadow mode, tune on real traffic, and enforce when teams are ready.

research context

Built for the trap layers where runtime controls are strongest today.

Google DeepMind's AI Agent Traps paper maps multiple attack surfaces across perception, reasoning, memory, action, multi-agent dynamics, and human oversight. TrapDefense does not pretend to solve the whole map. It focuses on perception and action, where practical controls can be deployed now.

Strongest: action-layer traps, risky egress, unsafe destinations, and data exfiltration.

Strong: content-injection signals embedded in HTML, markdown, encoded payloads, and prompt-shaped text.

Partial: prompt-injection keywords and lightweight content scanning.

Honest limit: not a full solution for memory poisoning, model internals, or multi-agent systemic failure.

Core capabilities

Practical controls for real agent actions.

TrapDefense stays focused on the controls teams need first: action visibility, outbound policy enforcement, sensitive-data protection, rollout safety, and auditability across agent execution paths.

Gate risky outbound requests

Allow approved domains, block suspicious egress, and keep agent traffic on known routes before execution completes.

Redact sensitive data

Detect and redact PII in requests and outputs so agents do not leak secrets by accident.

Protect MCP and internal tool workflows

Cover MCP traffic, framework tool calls, and generic internal or external HTTP egress paths.

Ship policies safely

Roll out new rules with shadow, warn, and enforce modes rather than turning on hard blocks from day one.

Scan for lightweight injection signals

Catch hidden text, HTML comment attacks, base64-like blobs, prompt-injection phrases, and exfiltration-shaped language.

Keep audit trails

Write structured events for findings, decisions, streaming scans, redaction, and rollout review.

How it works

A self-hosted defense layer built for rollout.

TrapDefense fits the way real teams ship: deploy behind your own domain, observe first, tighten gradually, and keep the evidence needed to tune policies without breaking production.

Inspect inbound and outbound context

Normalize requests, inspect content for lightweight risk signals, and build a decision context before forwarding.

Evaluate policy before forwarding

Apply allowlists, blocklists, egress controls, PII checks, and mode-aware decisions before the request leaves your stack.

Protect outputs and record evidence

Redact sensitive output, inspect streams, and leave a trace your team can review later.

Shadow -> Warn -> Enforce lets teams deploy runtime controls without pretending they already know every safe rule on day one.

Use cases

Built for teams operating agents in the real world.

TrapDefense is strongest where outbound action matters more than text: MCP servers, internal assistants, agent gateways, and tool-calling systems connected to sensitive company workflows.

MCP

Protect MCP servers

Put policy enforcement and audit logging in front of MCP traffic without sending data through a third-party gateway.

TOOL

Audit tool calls in production

Capture which agent called which tool, with what arguments, and which policy decision applied before execution.

HTTP

Reduce exfiltration risk

Block suspicious destinations, review generic HTTP egress, and redact sensitive data before it leaves the system.

OPS

Roll out security safely

Start with observation, move into warning, then enforce with confidence after policy tuning.

Deployment model

Deploy the runtime control point in your environment.

TrapDefense is designed for self-hosted deployment first. Keep agent action logs, tool arguments, and policy decisions inside your own boundary, then add enterprise workflow support only when your team needs it.

Data plane

Local runtime checkpoint

Run TrapDefense behind your own domain and route MCP calls, framework tool calls, or HTTP egress through a policy gate.

Policy

One policy path, multiple modes

Use the same decision engine for shadow, warn, and enforce so rollout behavior matches production behavior.

Ops

Enterprise layer when needed

Add policy management, rollout support, audit workflows, and onboarding help for production teams.

Runtime path

agent
  -> TrapDefense checkpoint
  -> MCP server / HTTP tool / webhook

# observe, audit, warn, then enforce

Policy config

{
  "mode": "enforce",
  "block_egress": true,
  "domain_allowlist": ["api.internal.com", "hooks.slack.com"],
  "pii_action": "warn",
  "redact_response_pii": true
}

Read the Quickstart

Open source + enterprise

Open source first. Enterprise when your team needs an operating layer.

TrapDefense starts as an open-source, self-hosted runtime control layer. Enterprise expands the operating layer with shared policies, audit workflows, rollout support, and onboarding for serious teams.

Open source

Deploy the runtime guard now.

Use the data plane and core policy engine for egress control, MCP integration, content scanning, PII redaction, and audit logs.

MCP gateway and generic HTTP policy gate

Framework hooks for tool-call enforcement

Scanner for lightweight content-injection detection

PII redaction and structured audit logs

Policy files and shadow/warn/enforce rollout

Enterprise

Operate policies across a team.

TrapDefense Enterprise is the next layer for teams that need centralized governance, onboarding help, and security operations around agent runtime controls.

Project and tenant policy management

Shared audit workflows and review processes

Operational rollout guidance and exception handling

Enterprise onboarding and support

Custom integration help for production environments

Talk to us

Start an enterprise conversation.

Tell us about your stack, your outbound risk, and what you want to protect. We'll route the inquiry to hellocosmos@gmail.com.

Name

Work Email

Company

Team Size

What are you protecting?

Timeline

What should we know?

Submissions are delivered to email, and users may see a captcha depending on spam protection.

FAQ

What teams usually ask first.

The goal is not to pretend every security problem is solved. The goal is to be crisp about where TrapDefense is strongest today.

What does TrapDefense protect against?

TrapDefense is strongest at action-layer risk: unsafe destinations, risky egress, sensitive output exposure, and policy enforcement around MCP, LLM, and internal HTTP traffic. It also adds lightweight detection for content-injection signals. It does not claim to solve every agent-security problem.

Is this a full agent security platform?

No. TrapDefense is a self-hosted runtime control layer and policy engine, not a complete security platform. It focuses on practical runtime controls teams can deploy now.

Why not just use prompt filtering?

Prompt filtering can catch some suspicious input, but it does not control the final tool call or outbound request. TrapDefense adds policy enforcement at the point where an agent action is about to cross a boundary.

Do I need MCP to use TrapDefense?

No. MCP is a strong use case, but TrapDefense also works for framework tool calls and internal or external HTTP egress workflows.

Self-hosted by default

Deploy the runtime guard now. Add the operating layer when you need it.

Start with the docs, quickstart, and open-source data plane. When your team needs rollout help, audit workflows, or enterprise support, bring us in.

Read Docs View on GitHub Enterprise