AI Agent Incident Response: Containment and Recovery Playbook

Written by Rafter Team
February 12, 2026

Your AI agent gets compromised at 3 AM on a Saturday. A prompt injection is actively exfiltrating customer data to an external server. Every minute the agent runs, more data leaks. Do you know exactly what to do?
Most teams don't. They scramble to find the kill switch, argue about whether to shut everything down, and waste hours on containment that should take minutes. The difference between a contained incident and a catastrophic breach is preparation. This playbook gives you predefined containment steps for every major AI agent incident type, from prompt injection to runaway cost explosions.
GDPR requires breach notification within 72 hours. If you're figuring out your response plan during an active incident, you've already lost critical time.
Prerequisite: The Kill Switch
Before anything else, build a kill switch. Every AI agent deployment needs the ability to immediately halt all agent operations.
# ✓ Kill switch implementation
class AgentKillSwitch:
def __init__(self):
self.redis = Redis() # Shared state
def activate(self, reason: str, scope: str = "global"):
self.redis.set(f"kill_switch:{scope}", json.dumps({
"active": True,
"reason": reason,
"activated_by": get_current_user(),
"timestamp": utc_now(),
}))
# Immediately notify on-call
pagerduty.trigger(f"Kill switch activated: {reason}")
def check(self, scope: str = "global") -> bool:
state = self.redis.get(f"kill_switch:{scope}")
if state and json.loads(state)["active"]:
return True # STOP - do not execute any actions
return False
# In agent execution loop:
def execute_step(task):
if kill_switch.check():
return "Agent operations suspended. Contact security team."
if kill_switch.check(scope=f"user:{task.user_id}"):
return "Your agent session has been suspended."
# ... proceed with normal execution
Requirements:
- Accessible to on-call staff (not just the engineering lead)
- Works at global level (stop everything) and per-session level
- Takes effect within seconds, not minutes
- Logs who activated it and why
Incident Type 1: Prompt Injection / Data Exfiltration
Signals: Alert on outbound data to non-whitelisted domain. Unusual agent behavior. User report of unexpected actions.
Containment (First 15 Minutes)
1. Activate kill switch for the affected session(s):
# Suspend specific user session
curl -X POST https://agent-api/kill-switch \
-d '{"scope": "session:abc123", "reason": "suspected prompt injection"}'
2. Block the exfiltration endpoint at the network level:
# Add to firewall deny list immediately
iptables -A OUTPUT -d attacker-ip -j DROP
# Or via cloud security group / WAF rule
3. Revoke exposed credentials:
# Rotate any API keys the agent had access to
vault write -f secrets/stripe/rotate
vault write -f secrets/aws/rotate
# Revoke OAuth tokens for the affected user
oauth_service.revoke_all_tokens(user_id="affected-user")
4. Preserve evidence:
# Snapshot logs before they rotate
cp /var/log/agent/*.log /incident/IR-2026-001/
# Capture container state if still running
docker commit compromised-container forensic-snapshot
Investigation (Hours 1-4)
Determine scope:
-- What did the agent do during the incident window?
SELECT timestamp, action, tool, params, result
FROM audit_log
WHERE session_id = 'abc123'
AND timestamp BETWEEN '2026-02-05 03:00' AND '2026-02-05 03:30'
ORDER BY timestamp;
-- Did the agent contact external endpoints?
SELECT timestamp, destination, bytes_sent
FROM network_log
WHERE source_process = 'agent'
AND destination NOT IN (SELECT domain FROM allowlist)
AND timestamp > '2026-02-05 03:00';
Key questions to answer:
- What was the injection vector? (user input, fetched content, document?)
- What data was accessed or exfiltrated?
- How long was the agent compromised?
- Were other users/sessions affected?
- Did the injection persist in memory?
Recovery
1. Flush agent memory to remove any persistent injection:
# Clear vector DB entries from compromised session
vector_db.delete(filter={"session_id": "abc123"})
# Reset conversation history
memory_store.clear(session_id="abc123")
2. Patch the vulnerability:
- If input filter bypass: add the new pattern to detection rules
- If content sanitization gap: fix the sanitizer
- If tool permission gap: tighten the permission boundary
3. Re-enable with monitoring:
- Resume the agent with heightened alerting thresholds
- Watch for recurrence of the same attack pattern
- Keep network block on the exfiltration endpoint
Incident Type 2: Malicious Plugin / Supply Chain Compromise
Signals: Plugin making unexpected network calls. Anomalous syscalls from plugin container. User reports of strange behavior after plugin update.
Containment (First 15 Minutes)
1. Disable the compromised plugin across all instances:
# Blacklist the plugin immediately
plugin_registry.disable("suspicious-plugin-v2.1")
plugin_registry.block_version("suspicious-plugin", ">=2.1")
2. Quarantine affected agents:
# Stop all agent containers running the compromised plugin
docker ps --filter "label=plugin=suspicious-plugin" -q | xargs docker stop
3. Capture forensic data:
# Memory dump of affected container
docker exec compromised-agent cat /proc/1/maps > /incident/memory-map.txt
# Network connections
docker exec compromised-agent ss -tlnp > /incident/connections.txt
# File system changes
docker diff compromised-agent > /incident/fs-changes.txt
Investigation
Analyze the plugin:
# Check what the plugin actually does
grep -r "eval\|exec\|subprocess\|__import__" plugin-dir/
grep -r "http://\|https://" plugin-dir/ | grep -v localhost
# Compare current version against last known-good
diff -r plugin-v2.0/ plugin-v2.1/
Check for persistence:
- Did the plugin modify any agent configuration?
- Did it write files outside its sandbox?
- Did it install any cron jobs or background processes?
- Did it modify other plugins or the agent core?
Recovery
1. Roll back to last known-good version of the plugin (or remove entirely).
2. Rotate all credentials the plugin had access to. Even with sandboxing, assume the worst.
3. Re-vet the entire plugin before re-enabling. If from a community source, report the issue and consider alternatives.
4. Tighten sandboxing if the plugin escaped its container or accessed unauthorized resources.
Incident Type 3: Multi-Tenant Data Leak
Signals: User reports seeing another user's data. Monitoring detects cross-tenant access in audit logs. QA finds tenant isolation failure in testing.
Containment (First 15 Minutes)
1. Disable the leaking component:
# If cache-related, flush the cache
cache.flush_all()
# If vector DB related, disable vector search
feature_flags.disable("vector_search")
# If API-related, switch to strict tenant validation mode
config.set("tenant_validation", "strict")
2. Determine scope:
-- Find all cross-tenant access events
SELECT requester_tenant_id, accessed_tenant_id, action, timestamp
FROM audit_log
WHERE requester_tenant_id != accessed_tenant_id
AND timestamp > '2026-02-01';
3. Notify affected tenants within 72 hours (GDPR requirement):
- Which tenants' data was exposed?
- Which tenants saw data they shouldn't have?
- What specific data types were leaked?
Investigation
Root cause analysis:
- Missing
tenant_idin a database query? - Cache key collision (keys not namespaced by tenant)?
- Vector search returning results across tenant boundaries?
- Race condition in session management?
# Common root causes to check:
# 1. Missing tenant filter
# BAD: db.query("SELECT * FROM data WHERE user_id = ?", user_id)
# GOOD: db.query("SELECT * FROM data WHERE user_id = ? AND tenant_id = ?",
# user_id, tenant_id)
# 2. Cache key collision
# BAD: cache_key = f"user:{user_id}"
# GOOD: cache_key = f"tenant:{tenant_id}:user:{user_id}"
# 3. Vector search without filter
# BAD: vector_db.search(embedding, top_k=10)
# GOOD: vector_db.search(embedding, filter={"tenant_id": tid}, top_k=10)
Recovery
1. Fix the root cause (add missing tenant filter, fix cache keys, etc.)
2. Purge leaked data from caches, logs, and any intermediate storage where cross-tenant data may have been cached.
3. Run full isolation test suite before re-enabling the affected component.
4. Engage legal/compliance for breach notification requirements.
Incident Type 4: Runaway Agent / Cost Explosion
Signals: Cost monitoring alarm. Agent process consuming excessive CPU/memory. API usage spike. Agent stuck in retry loop.
Containment (Immediate)
1. Kill the runaway process:
# Terminate the specific agent task
agent-cli kill-task --task-id runaway-123
# Or kill the container
docker stop runaway-agent-container
# Or global kill switch if multiple agents affected
curl -X POST https://agent-api/kill-switch \
-d '{"scope": "global", "reason": "cost explosion"}'
2. Set hard budget caps to prevent further spending:
# AWS billing alarm (if not already set)
aws budgets create-budget \
--budget '{"BudgetName":"agent-emergency","BudgetLimit":{"Amount":"100","Unit":"USD"}}'
# API provider: reduce rate limit
openai_client.set_rate_limit(requests_per_minute=10)
3. Assess the damage:
-- What did the agent do during the runaway period?
SELECT tool, COUNT(*), SUM(estimated_cost)
FROM audit_log
WHERE task_id = 'runaway-123'
GROUP BY tool;
Recovery
1. Implement circuit breakers if they weren't already in place:
- Maximum iterations per task
- Maximum wall-clock time per task
- Maximum cost per task
- Maximum concurrent tool calls
2. Contact cloud/API provider if charges are clearly from abuse (some providers will work with you on credits).
3. Add loop detection to catch similar patterns in the future.
Post-Incident Process
After containment and recovery, every incident gets the same follow-up:
Forensic Report
Document:
- Timeline of events (when it started, when detected, when contained)
- Root cause analysis
- Data impact assessment (what was exposed, to whom)
- Actions taken during response
- Gaps identified in detection and response
Fix Verification
- Root cause patched and verified in staging
- Regression test added to CI/CD
- Red team test confirms the attack vector is closed
Process Improvements
- Update this playbook with lessons learned
- Adjust monitoring rules to catch similar incidents earlier
- Review and tighten relevant security controls
- Conduct drill with updated procedures
Communication
Internal:
- Incident report to security team and engineering leadership
- Lessons-learned session with broader team
External (if required):
- Breach notification to affected users (within legal timeframes)
- Regulatory notification (GDPR: 72 hours, varies by jurisdiction)
- Status page update if service was impacted
Incident Response Readiness Checklist
Before your next incident:
- Kill switch implemented and accessible to on-call staff
- On-call rotation established with escalation paths
- Credential rotation procedures documented and tested
- Log retention sufficient for forensic investigation (30+ days)
- Breach notification templates drafted
- Communication channels established (war room, Slack channel)
- Forensic data collection procedures documented
- Tabletop exercises conducted (at least quarterly)
- Contact list current: security team, legal, PR, executive sponsor
Conclusion
Incidents are inevitable. The question isn't whether your AI agent will be compromised, but whether you'll contain it in minutes or hours. A prepared team with a tested playbook reduces breach impact by orders of magnitude.
Act now:
- Build and test your kill switch today
- Set up cost and behavior monitoring alerts
- Document credential rotation procedures
- Run a tabletop exercise with your team this month
- Pre-draft breach notification templates
The worst time to write an incident response plan is during an incident.
Related Resources
- Open Claw Security Audit: Full Series Overview
- AI Agent Security Controls: A Defense-in-Depth Architecture
- Red Teaming AI Agents: A Testing and Validation Playbook
- AI Agent Data Leakage: Secrets Management and Privacy Risks
- Multi-Tenant Isolation for AI Agents
- AI Agent Architecture: Threat Modeling Your Attack Surface