Self-hosted security proxy for agent traffic

Stop unsafe agent traffic before it leaves your stack.

TrapDefense is a self-hosted security proxy for OpenAI, Anthropic, MCP, and internal HTTP tools. Control risky destinations, redact sensitive data, and keep audit evidence without routing production traffic through a third-party SaaS. Inspired by DeepMind's AI Agent Traps research.

Self-hosted OpenAI + Anthropic + MCP Shadow -> Warn -> Enforce Audit-ready
reverse proxy checkpoint enforce mode active
# run TrapDefense on a private host
asr proxy --config config.yaml

# point your model client at the proxy
client = OpenAI(
    base_url="https://gateway.company.com/v1",
    api_key="unused-if-proxy-injects-key",
)

# policy controls egress, PII redaction, and audit logs
egress
Control risky outbound traffic

Allow known destinations, block suspicious domains, and keep internal agents on approved routes.

audit
Keep evidence your security team can use

Record structured events for policy decisions, stream findings, redaction, and rollout review.

data protection
Redact sensitive outputs before they leave

Catch emails, API keys, Korean PII, and other sensitive values before downstream exposure.

deployment
Deploy locally first, expand safely

Start with the self-hosted proxy, then add SDK hooks and enterprise rollout support when needed.

focus
Agent egress control
integration
OpenAI, Anthropic, MCP, HTTP
rollout
Shadow, warn, enforce
evidence
Structured audit evidence
Why TrapDefense

The model is not the only risk. Outbound action is.

Prompt filtering helps, but real damage happens when an agent calls an external API, hits an MCP server, posts to a webhook, or leaks sensitive output. TrapDefense sits in that execution path as a policy gate with audit evidence.

the real problem

Damage happens when the agent actually reaches a destination.

Hidden instructions matter, but the business risk appears when an agent sends data outward, invokes a high-risk tool, or crosses a boundary your team did not intend.

Filters do not enforce egress Scanning text helps, but it does not stop the final network call or tool invocation.
Security teams need evidence Blocking without logs creates blind spots. Logs without enforcement arrive after the data already moved.
Production rollouts need safety rails Start in shadow mode, tune on real traffic, and enforce when teams are ready.
research context

Built for the trap layers where runtime controls are strongest today.

Google DeepMind's AI Agent Traps paper maps multiple attack surfaces across perception, reasoning, memory, action, multi-agent dynamics, and human oversight. TrapDefense does not pretend to solve the whole map. It focuses on perception and action, where practical controls can be deployed now.

Strongest: action-layer traps, risky egress, unsafe destinations, and data exfiltration.
Strong: content-injection signals embedded in HTML, markdown, encoded payloads, and prompt-shaped text.
Partial: prompt-injection keywords and lightweight content scanning.
Honest limit: not a full solution for memory poisoning, model internals, or multi-agent systemic failure.
Core capabilities

Practical controls for real agent traffic.

TrapDefense stays focused on the controls teams need first: outbound policy enforcement, sensitive-data protection, rollout safety, and auditability across agent execution paths.

01

Gate risky outbound requests

Allow approved domains, block suspicious egress, and keep agent traffic on known routes before execution completes.

02

Redact sensitive data

Detect and redact PII in requests and outputs so agents do not leak secrets by accident.

03

Protect MCP and internal tool workflows

Cover MCP traffic, OpenAI-compatible traffic, Anthropic traffic, and generic internal HTTP paths.

04

Ship policies safely

Roll out new rules with shadow, warn, and enforce modes rather than turning on hard blocks from day one.

05

Scan for lightweight injection signals

Catch hidden text, HTML comment attacks, base64-like blobs, prompt-injection phrases, and exfiltration-shaped language.

06

Keep audit trails

Write structured events for findings, decisions, streaming scans, redaction, and rollout review.

How it works

A self-hosted defense layer built for rollout.

TrapDefense fits the way real teams ship: deploy behind your own domain, observe first, tighten gradually, and keep the evidence needed to tune policies without breaking production.

Inspect inbound and outbound context

Normalize requests, inspect content for lightweight risk signals, and build a decision context before forwarding.

Evaluate policy before forwarding

Apply allowlists, blocklists, egress controls, PII checks, and mode-aware decisions before the request leaves your stack.

Protect outputs and record evidence

Redact sensitive output, inspect streams, and leave a trace your team can review later.

Shadow -> Warn -> Enforce lets teams deploy runtime controls without pretending they already know every safe rule on day one.
Use cases

Built for teams operating agents in the real world.

TrapDefense is strongest where outbound action matters more than text: MCP servers, internal assistants, agent gateways, and tool-calling systems connected to sensitive company workflows.

MCP

Protect MCP servers

Put policy enforcement and audit logging in front of MCP traffic without sending data through a third-party gateway.

LLM

Control model traffic

Route OpenAI-compatible and Anthropic traffic through one policy gate with tenant-aware controls.

HTTP

Reduce exfiltration risk

Block suspicious destinations, review generic HTTP egress, and redact sensitive data before it leaves the system.

OPS

Roll out security safely

Start with observation, move into warning, then enforce with confidence after policy tuning.

Deployment model

Deploy the proxy in your environment, not ours.

TrapDefense is designed for self-hosted deployment first. Put a reverse proxy on your domain, keep provider traffic on your own path, and add enterprise workflow support only when your team needs it.

Proxy

Self-hosted data plane

Run TrapDefense behind your own domain and forward traffic to OpenAI, Anthropic, MCP servers, or internal HTTP targets.

Policy

One policy path, multiple modes

Use the same decision engine for shadow, warn, and enforce so rollout behavior matches production behavior.

Ops

Enterprise layer when needed

Add policy management, rollout support, audit workflows, and onboarding help for production teams.

Client

from openai import OpenAI

client = OpenAI(
  base_url="https://gateway.company.com/v1",
  api_key="unused-if-proxy-injects-key"
)

Policy config

{
  "mode": "enforce",
  "block_egress": true,
  "domain_allowlist": ["api.openai.com", "api.anthropic.com"],
  "pii_action": "warn",
  "redact_response_pii": true
}
Open source + enterprise

Open source first. Enterprise when your team needs an operating layer.

TrapDefense starts as an open-source, self-hosted security proxy. Enterprise expands the operating layer with shared policies, audit workflows, rollout support, and onboarding for serious teams.

Open source

Deploy the proxy now.

Use the proxy and core policy engine for egress control, MCP integration, content scanning, PII redaction, and audit logs.

OpenAI-compatible and Anthropic proxy routing
MCP gateway and generic HTTP policy gate
Scanner for lightweight content-injection detection
PII redaction and structured audit logs
Policy files and shadow/warn/enforce rollout
Enterprise

Operate policies across a team.

TrapDefense Enterprise is the next layer for teams that need centralized governance, onboarding help, and security operations around agent runtime controls.

Project and tenant policy management
Shared audit workflows and review processes
Operational rollout guidance and exception handling
Enterprise onboarding and support
Custom integration help for production environments
Talk to us

Start an enterprise conversation.

Tell us about your stack, your outbound risk, and what you want to protect. We'll route the inquiry to hellocosmos@gmail.com.

Submissions are delivered to email, and users may see a captcha depending on spam protection.
FAQ

What teams usually ask first.

The goal is not to pretend every security problem is solved. The goal is to be crisp about where TrapDefense is strongest today.

What does TrapDefense protect against?

TrapDefense is strongest at action-layer risk: unsafe destinations, risky egress, sensitive output exposure, and policy enforcement around MCP, LLM, and internal HTTP traffic. It also adds lightweight detection for content-injection signals. It does not claim to solve every agent-security problem.

Is this a full agent security platform?

No. TrapDefense is a self-hosted security proxy and policy engine, not a complete security platform. It focuses on practical runtime controls teams can deploy now.

Why not just use prompt filtering?

Prompt filtering can catch some suspicious input, but it does not control the final outbound request. TrapDefense adds policy enforcement at the point where traffic is about to leave your stack.

Do I need MCP to use TrapDefense?

No. MCP is a strong use case, but TrapDefense also works as a proxy in front of OpenAI-compatible traffic, Anthropic traffic, and internal HTTP workflows.

Self-hosted by default

Deploy the proxy now. Add the operating layer when you need it.

Start with the docs, quickstart, and open-source proxy. When your team needs rollout help, audit workflows, or enterprise support, bring us in.