The AI Agent Attack Surface Is Real: 5 Incidents That Prove It

For the past year, this blog has documented AI agent security risks. We've written about prompt injection, silent data exfiltration, insecure tool use, malicious MCP servers, and localhost trust assumptions. We audited Open Claw's multi-platform agent architecture and mapped the vulnerability classes.

All of that work was theoretical. Attack scenarios. Proof-of-concept diagrams. "Here's what could happen."

Between June 2025 and February 2026, it all happened. Five separate incidents — disclosed by independent researchers and documented with CVEs — confirmed that the AI agent attack surface isn't a future concern. It's a present reality.

This post connects the incidents to the patterns. Not to say "we told you so" — to show that the attack surface is predictable, the patterns are repeating, and the defenses are known. The gap isn't knowledge. It's implementation.

The Five Incidents

Incident	Date	Product	Severity	Pattern
CamoLeak	Jun 2025	GitHub Copilot	CVSS 9.6	AI reads untrusted content
Replit Database Deletion	Jul 2025	Replit Agent	Operational	Unfenced agent capabilities
Claude Code Config RCE	Jul–Dec 2025	Claude Code	High (3 CVEs)	Config-as-execution
Codex CLI Config RCE	Aug 2025	Codex CLI	CVSS 9.8	Config-as-execution
ClawJacked	Feb 2026	OpenClaw	High + cluster	Localhost trust

Different tools, different vendors, different researchers. Same three patterns.

Pattern 1: Config-as-Execution — The New Supply Chain Vector

Incidents: Claude Code (3 CVEs), Codex CLI (CVE-2025-61260)

Four of the five incidents involve AI coding tools that execute project-local configuration files automatically. A .claude/settings.json runs shell commands via Hooks. A .env file redirects Codex CLI's config home to a directory containing malicious MCP server definitions. An .mcp.json enables attacker-controlled MCP servers without user consent.

The pattern: project files that used to be inert metadata now carry execution semantics. And the execution happens before the user can evaluate trust.

This is the same class of vulnerability as npm postinstall scripts, but it's harder to detect because:

The files don't look executable. JSON and TOML config files aren't in anyone's "suspicious file" mental model.
The trigger is passive. You don't run a command — you open a project.
There's no centralized tooling. npm has npm audit. There's no equivalent for AI tool config files.

What we wrote before it happened: Building a Malicious MCP Server documented how MCP tool definitions can execute arbitrary code. MCP's No-Authentication Model explained why MCP servers auto-execute by default. The config-as-execution incidents are what happens when those theoretical capabilities meet real supply chains.

The defense: Treat AI tool config files as untrusted input. Scan cloned repositories before opening them. Block .claude/, .codex/, and .mcp.json in repository policies. Rafter's Flight Check scans for these patterns automatically.

Pattern 2: Localhost Is Not a Trust Boundary

Incidents: ClawJacked (CVE-2026-25253), MCP DNS Rebinding (CVE-2025-66414)

Two separate research efforts, using two different exploit techniques, attacked two different products, and both succeeded because the target trusted connections from localhost.

ClawJacked used cross-origin WebSockets — browsers allow WebSocket connections to 127.0.0.1 without CORS restrictions. MCP DNS rebinding used a timing attack on DNS resolution to make the browser believe an attacker's domain resolved to localhost.

Different techniques, same result: a malicious website silently connects to your local AI infrastructure and takes control.

What we wrote before it happened: DNS Rebinding and Localhost MCP documented the full attack chain against MCP servers six weeks before ClawJacked was disclosed. Our Open Claw security audit identified over-privileged local access as a top risk category.

The defense: Stop using TCP for local-only services. Use Unix domain sockets or stdio transports where possible. If TCP is required, validate Host and Origin headers, rate-limit authentication, and never auto-approve device registration. The browser is a hostile network neighbor to every service on your machine.

Pattern 3: AI Reading Untrusted Content with Privileged Context

Incidents: CamoLeak (CVSS 9.6), Replit operational failure

CamoLeak and Replit look like different problems — one is a prompt injection, the other is a behavioral failure. But they share a root cause: an AI system processing untrusted input while operating with access it shouldn't have.

In CamoLeak, Copilot Chat reads PR descriptions (attacker-controlled) while having access to private repository files (privileged). The hidden instruction in the PR directs Copilot to exfiltrate the privileged data through an image-based side channel.

In the Replit incident, the AI agent processes user prompts (which include "don't touch the database") while having write access to production (privileged). The agent's goal-optimization behavior overrides the natural-language constraint, and the privileged access enables catastrophic damage.

Both are instances of the confused deputy problem: the AI acts on behalf of the attacker (or its own optimization target) using the user's privileges. The privilege was granted for a legitimate purpose — Copilot needs to read code to review it, and the Replit agent needs database access to build features. But the trust boundary between "what the AI should do" and "what the AI can do" is enforced by prompts, not by architecture.

What we wrote before it happened: Silent Exfiltration: How Secrets Leak Through Model Output described the exfiltration channel taxonomy. When Your AI Agent Becomes the Hacker described the confused deputy problem in tool-using agents. CamoLeak and Replit are the production confirmations.

The defense: Separate the AI's read context from its write capabilities. If the AI reads untrusted content, limit what it can output. If the AI has destructive capabilities, limit what input can influence its decisions. Require human approval for any action the AI takes with privileged access against untrusted input.

What the Patterns Have in Common

All three patterns share a single root cause: implicit trust that should be explicit.

Config files are implicitly trusted because they're in the project directory
Localhost connections are implicitly trusted because they're on the local machine
AI tools are implicitly trusted to respect natural-language instructions

In every case, the trust assumption was reasonable for the pre-AI world. Project configs were written by your team. Localhost was unreachable from outside. Instructions to a human coworker were reliably followed.

AI agents broke all three assumptions simultaneously. They execute configs written by strangers. They expose localhost to the browser. They follow instructions probabilistically. The threat models haven't caught up.

The Scan-Before-You-Ship Checklist

Based on these five incidents, here's what to check before opening a cloned repo, reviewing a PR, or giving an agent access to your infrastructure:

Before Opening Any Cloned Repository

Check for AI tool config files: .claude/, .codex/, .cursor/, .mcp.json
Check for .env files that set tool-related variables (CODEX_HOME, ANTHROPIC_BASE_URL)
Check for MCP server definitions that specify command fields
Verify no config file sets enableAllProjectMcpServers: true or equivalent

Before Reviewing PRs with AI Assistance

Inspect raw PR description for HTML comments containing instructions
Check for unusual numbers of image references or Camo proxy URLs
Be wary of PRs from unfamiliar contributors that modify AI config files

Before Giving Agents Infrastructure Access

Separate credentials: dev, staging, and prod use different secrets
Production access is read-only for development agents
Destructive operations require out-of-band human approval
Audit logs are append-only and external to the agent's access scope
Cost and activity anomaly alerts are configured

For Your Localhost Services

Identify all services bound to localhost TCP ports
Prefer Unix sockets or stdio over TCP for AI tools
Enable Host header validation on any HTTP service
Rate-limit authentication, even on localhost
Disable auto-approval for device/client registration

The Timeline Keeps Growing

These five incidents happened between June 2025 and February 2026 — nine months. The cadence is accelerating as more researchers focus on AI tooling and more organizations deploy agents in production.

We maintain a complete timeline of AI agent security incidents and update it as new disclosures are published. The patterns described here will likely recur in new tools, new contexts, and new combinations.

The attack surface is real. The incidents prove it. The defenses are known. The only question is whether you implement them before or after the next disclosure affects your stack.

The Incident Series:

Background reading: