AI Agent Incident Response: Containment and Recovery Playbook

Your AI agent gets compromised at 3 AM on a Saturday. A prompt injection is actively exfiltrating customer data to an external server. Every minute the agent runs, more data leaks. Do you know exactly what to do?

Most teams don't. They scramble to find the kill switch, argue about whether to shut everything down, and waste hours on containment that should take minutes. The difference between a contained incident and a catastrophic breach is preparation. This playbook gives you predefined containment steps for every major AI agent incident type, from prompt injection to runaway cost explosions.

GDPR requires breach notification within 72 hours. If you're figuring out your response plan during an active incident, you've already lost critical time.

Prerequisite: The Kill Switch

Before anything else, build a kill switch. Every AI agent deployment needs the ability to immediately halt all agent operations.


# ✓ Kill switch implementation
class AgentKillSwitch:
    def __init__(self):
        self.redis = Redis()  # Shared state

    def activate(self, reason: str, scope: str = "global"):
        self.redis.set(f"kill_switch:{scope}", json.dumps({
            "active": True,
            "reason": reason,
            "activated_by": get_current_user(),
            "timestamp": utc_now(),
        }))
        # Immediately notify on-call
        pagerduty.trigger(f"Kill switch activated: {reason}")

    def check(self, scope: str = "global") -> bool:
        state = self.redis.get(f"kill_switch:{scope}")
        if state and json.loads(state)["active"]:
            return True  # STOP - do not execute any actions
        return False

# In agent execution loop:
def execute_step(task):
    if kill_switch.check():
        return "Agent operations suspended. Contact security team."
    if kill_switch.check(scope=f"user:{task.user_id}"):
        return "Your agent session has been suspended."
    # ... proceed with normal execution

Requirements:

Accessible to on-call staff (not just the engineering lead)
Works at global level (stop everything) and per-session level
Takes effect within seconds, not minutes
Logs who activated it and why

Incident Type 1: Prompt Injection / Data Exfiltration

Signals: Alert on outbound data to non-whitelisted domain. Unusual agent behavior. User report of unexpected actions.

Containment (First 15 Minutes)

1. Activate kill switch for the affected session(s):


# Suspend specific user session
curl -X POST https://agent-api/kill-switch \
  -d '{"scope": "session:abc123", "reason": "suspected prompt injection"}'

2. Block the exfiltration endpoint at the network level:


# Add to firewall deny list immediately
iptables -A OUTPUT -d attacker-ip -j DROP
# Or via cloud security group / WAF rule

3. Revoke exposed credentials:


# Rotate any API keys the agent had access to
vault write -f secrets/stripe/rotate
vault write -f secrets/aws/rotate
# Revoke OAuth tokens for the affected user
oauth_service.revoke_all_tokens(user_id="affected-user")

4. Preserve evidence:


# Snapshot logs before they rotate
cp /var/log/agent/*.log /incident/IR-2026-001/
# Capture container state if still running
docker commit compromised-container forensic-snapshot

Investigation (Hours 1-4)

Determine scope:


-- What did the agent do during the incident window?
SELECT timestamp, action, tool, params, result
FROM audit_log
WHERE session_id = 'abc123'
  AND timestamp BETWEEN '2026-02-05 03:00' AND '2026-02-05 03:30'
ORDER BY timestamp;

-- Did the agent contact external endpoints?
SELECT timestamp, destination, bytes_sent
FROM network_log
WHERE source_process = 'agent'
  AND destination NOT IN (SELECT domain FROM allowlist)
  AND timestamp > '2026-02-05 03:00';

Key questions to answer:

What was the injection vector? (user input, fetched content, document?)
What data was accessed or exfiltrated?
How long was the agent compromised?
Were other users/sessions affected?
Did the injection persist in memory?

Recovery

1. Flush agent memory to remove any persistent injection:


# Clear vector DB entries from compromised session
vector_db.delete(filter={"session_id": "abc123"})
# Reset conversation history
memory_store.clear(session_id="abc123")

2. Patch the vulnerability:

If input filter bypass: add the new pattern to detection rules
If content sanitization gap: fix the sanitizer
If tool permission gap: tighten the permission boundary

3. Re-enable with monitoring:

Resume the agent with heightened alerting thresholds
Watch for recurrence of the same attack pattern
Keep network block on the exfiltration endpoint

Incident Type 2: Malicious Plugin / Supply Chain Compromise

Signals: Plugin making unexpected network calls. Anomalous syscalls from plugin container. User reports of strange behavior after plugin update.

Containment (First 15 Minutes)

1. Disable the compromised plugin across all instances:


# Blacklist the plugin immediately
plugin_registry.disable("suspicious-plugin-v2.1")
plugin_registry.block_version("suspicious-plugin", ">=2.1")

2. Quarantine affected agents:


# Stop all agent containers running the compromised plugin
docker ps --filter "label=plugin=suspicious-plugin" -q | xargs docker stop

3. Capture forensic data:


# Memory dump of affected container
docker exec compromised-agent cat /proc/1/maps > /incident/memory-map.txt
# Network connections
docker exec compromised-agent ss -tlnp > /incident/connections.txt
# File system changes
docker diff compromised-agent > /incident/fs-changes.txt

Investigation

Analyze the plugin:


# Check what the plugin actually does
grep -r "eval\|exec\|subprocess\|__import__" plugin-dir/
grep -r "http://\|https://" plugin-dir/ | grep -v localhost
# Compare current version against last known-good
diff -r plugin-v2.0/ plugin-v2.1/

Check for persistence:

Did the plugin modify any agent configuration?
Did it write files outside its sandbox?
Did it install any cron jobs or background processes?
Did it modify other plugins or the agent core?

Recovery

1. Roll back to last known-good version of the plugin (or remove entirely).

2. Rotate all credentials the plugin had access to. Even with sandboxing, assume the worst.

3. Re-vet the entire plugin before re-enabling. If from a community source, report the issue and consider alternatives.

4. Tighten sandboxing if the plugin escaped its container or accessed unauthorized resources.

Incident Type 3: Multi-Tenant Data Leak

Signals: User reports seeing another user's data. Monitoring detects cross-tenant access in audit logs. QA finds tenant isolation failure in testing.

Containment (First 15 Minutes)

1. Disable the leaking component:


# If cache-related, flush the cache
cache.flush_all()
# If vector DB related, disable vector search
feature_flags.disable("vector_search")
# If API-related, switch to strict tenant validation mode
config.set("tenant_validation", "strict")

2. Determine scope:


-- Find all cross-tenant access events
SELECT requester_tenant_id, accessed_tenant_id, action, timestamp
FROM audit_log
WHERE requester_tenant_id != accessed_tenant_id
  AND timestamp > '2026-02-01';

3. Notify affected tenants within 72 hours (GDPR requirement):

Which tenants' data was exposed?
Which tenants saw data they shouldn't have?
What specific data types were leaked?

Investigation

Root cause analysis:

Missing tenant_id in a database query?
Cache key collision (keys not namespaced by tenant)?
Vector search returning results across tenant boundaries?
Race condition in session management?


# Common root causes to check:

# 1. Missing tenant filter
# BAD: db.query("SELECT * FROM data WHERE user_id = ?", user_id)
# GOOD: db.query("SELECT * FROM data WHERE user_id = ? AND tenant_id = ?",
#                user_id, tenant_id)

# 2. Cache key collision
# BAD: cache_key = f"user:{user_id}"
# GOOD: cache_key = f"tenant:{tenant_id}:user:{user_id}"

# 3. Vector search without filter
# BAD: vector_db.search(embedding, top_k=10)
# GOOD: vector_db.search(embedding, filter={"tenant_id": tid}, top_k=10)

Recovery

1. Fix the root cause (add missing tenant filter, fix cache keys, etc.)

2. Purge leaked data from caches, logs, and any intermediate storage where cross-tenant data may have been cached.

3. Run full isolation test suite before re-enabling the affected component.

4. Engage legal/compliance for breach notification requirements.

Incident Type 4: Runaway Agent / Cost Explosion

Signals: Cost monitoring alarm. Agent process consuming excessive CPU/memory. API usage spike. Agent stuck in retry loop.

Containment (Immediate)

1. Kill the runaway process:


# Terminate the specific agent task
agent-cli kill-task --task-id runaway-123

# Or kill the container
docker stop runaway-agent-container

# Or global kill switch if multiple agents affected
curl -X POST https://agent-api/kill-switch \
  -d '{"scope": "global", "reason": "cost explosion"}'

2. Set hard budget caps to prevent further spending:


# AWS billing alarm (if not already set)
aws budgets create-budget \
  --budget '{"BudgetName":"agent-emergency","BudgetLimit":{"Amount":"100","Unit":"USD"}}'

# API provider: reduce rate limit
openai_client.set_rate_limit(requests_per_minute=10)

3. Assess the damage:


-- What did the agent do during the runaway period?
SELECT tool, COUNT(*), SUM(estimated_cost)
FROM audit_log
WHERE task_id = 'runaway-123'
GROUP BY tool;

Recovery

1. Implement circuit breakers if they weren't already in place:

Maximum iterations per task
Maximum wall-clock time per task
Maximum cost per task
Maximum concurrent tool calls

2. Contact cloud/API provider if charges are clearly from abuse (some providers will work with you on credits).

3. Add loop detection to catch similar patterns in the future.

Post-Incident Process

After containment and recovery, every incident gets the same follow-up:

Forensic Report

Document:

Timeline of events (when it started, when detected, when contained)
Root cause analysis
Data impact assessment (what was exposed, to whom)
Actions taken during response
Gaps identified in detection and response

Fix Verification

Root cause patched and verified in staging
Regression test added to CI/CD
Red team test confirms the attack vector is closed

Process Improvements

Update this playbook with lessons learned
Adjust monitoring rules to catch similar incidents earlier
Review and tighten relevant security controls
Conduct drill with updated procedures

Communication

Internal:

Incident report to security team and engineering leadership
Lessons-learned session with broader team

External (if required):

Breach notification to affected users (within legal timeframes)
Regulatory notification (GDPR: 72 hours, varies by jurisdiction)
Status page update if service was impacted

Incident Response Readiness Checklist

Before your next incident:

Kill switch implemented and accessible to on-call staff
On-call rotation established with escalation paths
Credential rotation procedures documented and tested
Log retention sufficient for forensic investigation (30+ days)
Breach notification templates drafted
Communication channels established (war room, Slack channel)
Forensic data collection procedures documented
Tabletop exercises conducted (at least quarterly)
Contact list current: security team, legal, PR, executive sponsor

Conclusion

Incidents are inevitable. The question isn't whether your AI agent will be compromised, but whether you'll contain it in minutes or hours. A prepared team with a tested playbook reduces breach impact by orders of magnitude.

Act now:

Build and test your kill switch today
Set up cost and behavior monitoring alerts
Document credential rotation procedures
Run a tabletop exercise with your team this month
Pre-draft breach notification templates

The worst time to write an incident response plan is during an incident.

GDPR requires breach notification within 72 hours. If you're figuring out your response plan during an active incident, you've already lost critical time.

Prerequisite: The Kill Switch

Before anything else, build a kill switch. Every AI agent deployment needs the ability to immediately halt all agent operations.


# ✓ Kill switch implementation
class AgentKillSwitch:
    def __init__(self):
        self.redis = Redis()  # Shared state

    def activate(self, reason: str, scope: str = "global"):
        self.redis.set(f"kill_switch:{scope}", json.dumps({
            "active": True,
            "reason": reason,
            "activated_by": get_current_user(),
            "timestamp": utc_now(),
        }))
        # Immediately notify on-call
        pagerduty.trigger(f"Kill switch activated: {reason}")

    def check(self, scope: str = "global") -> bool:
        state = self.redis.get(f"kill_switch:{scope}")
        if state and json.loads(state)["active"]:
            return True  # STOP - do not execute any actions
        return False

# In agent execution loop:
def execute_step(task):
    if kill_switch.check():
        return "Agent operations suspended. Contact security team."
    if kill_switch.check(scope=f"user:{task.user_id}"):
        return "Your agent session has been suspended."
    # ... proceed with normal execution

Requirements:

Accessible to on-call staff (not just the engineering lead)
Works at global level (stop everything) and per-session level
Takes effect within seconds, not minutes
Logs who activated it and why

Incident Type 1: Prompt Injection / Data Exfiltration

Signals: Alert on outbound data to non-whitelisted domain. Unusual agent behavior. User report of unexpected actions.

Containment (First 15 Minutes)

1. Activate kill switch for the affected session(s):


# Suspend specific user session
curl -X POST https://agent-api/kill-switch \
  -d '{"scope": "session:abc123", "reason": "suspected prompt injection"}'

2. Block the exfiltration endpoint at the network level:


# Add to firewall deny list immediately
iptables -A OUTPUT -d attacker-ip -j DROP
# Or via cloud security group / WAF rule

3. Revoke exposed credentials:


# Rotate any API keys the agent had access to
vault write -f secrets/stripe/rotate
vault write -f secrets/aws/rotate
# Revoke OAuth tokens for the affected user
oauth_service.revoke_all_tokens(user_id="affected-user")

4. Preserve evidence:


# Snapshot logs before they rotate
cp /var/log/agent/*.log /incident/IR-2026-001/
# Capture container state if still running
docker commit compromised-container forensic-snapshot

Investigation (Hours 1-4)

Determine scope:


-- What did the agent do during the incident window?
SELECT timestamp, action, tool, params, result
FROM audit_log
WHERE session_id = 'abc123'
  AND timestamp BETWEEN '2026-02-05 03:00' AND '2026-02-05 03:30'
ORDER BY timestamp;

-- Did the agent contact external endpoints?
SELECT timestamp, destination, bytes_sent
FROM network_log
WHERE source_process = 'agent'
  AND destination NOT IN (SELECT domain FROM allowlist)
  AND timestamp > '2026-02-05 03:00';

Key questions to answer:

What was the injection vector? (user input, fetched content, document?)
What data was accessed or exfiltrated?
How long was the agent compromised?
Were other users/sessions affected?
Did the injection persist in memory?

Recovery

1. Flush agent memory to remove any persistent injection:


# Clear vector DB entries from compromised session
vector_db.delete(filter={"session_id": "abc123"})
# Reset conversation history
memory_store.clear(session_id="abc123")

2. Patch the vulnerability:

If input filter bypass: add the new pattern to detection rules
If content sanitization gap: fix the sanitizer
If tool permission gap: tighten the permission boundary

3. Re-enable with monitoring:

Resume the agent with heightened alerting thresholds
Watch for recurrence of the same attack pattern
Keep network block on the exfiltration endpoint

Incident Type 2: Malicious Plugin / Supply Chain Compromise

Signals: Plugin making unexpected network calls. Anomalous syscalls from plugin container. User reports of strange behavior after plugin update.

Containment (First 15 Minutes)

1. Disable the compromised plugin across all instances:


# Blacklist the plugin immediately
plugin_registry.disable("suspicious-plugin-v2.1")
plugin_registry.block_version("suspicious-plugin", ">=2.1")

2. Quarantine affected agents:


# Stop all agent containers running the compromised plugin
docker ps --filter "label=plugin=suspicious-plugin" -q | xargs docker stop

3. Capture forensic data:


# Memory dump of affected container
docker exec compromised-agent cat /proc/1/maps > /incident/memory-map.txt
# Network connections
docker exec compromised-agent ss -tlnp > /incident/connections.txt
# File system changes
docker diff compromised-agent > /incident/fs-changes.txt

Investigation

Analyze the plugin:


# Check what the plugin actually does
grep -r "eval\|exec\|subprocess\|__import__" plugin-dir/
grep -r "http://\|https://" plugin-dir/ | grep -v localhost
# Compare current version against last known-good
diff -r plugin-v2.0/ plugin-v2.1/

Check for persistence:

Did the plugin modify any agent configuration?
Did it write files outside its sandbox?
Did it install any cron jobs or background processes?
Did it modify other plugins or the agent core?

Recovery

1. Roll back to last known-good version of the plugin (or remove entirely).

2. Rotate all credentials the plugin had access to. Even with sandboxing, assume the worst.

3. Re-vet the entire plugin before re-enabling. If from a community source, report the issue and consider alternatives.

4. Tighten sandboxing if the plugin escaped its container or accessed unauthorized resources.

Incident Type 3: Multi-Tenant Data Leak

Signals: User reports seeing another user's data. Monitoring detects cross-tenant access in audit logs. QA finds tenant isolation failure in testing.

Containment (First 15 Minutes)

1. Disable the leaking component:


# If cache-related, flush the cache
cache.flush_all()
# If vector DB related, disable vector search
feature_flags.disable("vector_search")
# If API-related, switch to strict tenant validation mode
config.set("tenant_validation", "strict")

2. Determine scope:


-- Find all cross-tenant access events
SELECT requester_tenant_id, accessed_tenant_id, action, timestamp
FROM audit_log
WHERE requester_tenant_id != accessed_tenant_id
  AND timestamp > '2026-02-01';

3. Notify affected tenants within 72 hours (GDPR requirement):

Which tenants' data was exposed?
Which tenants saw data they shouldn't have?
What specific data types were leaked?

Investigation

Root cause analysis:

Missing tenant_id in a database query?
Cache key collision (keys not namespaced by tenant)?
Vector search returning results across tenant boundaries?
Race condition in session management?


# Common root causes to check:

# 1. Missing tenant filter
# BAD: db.query("SELECT * FROM data WHERE user_id = ?", user_id)
# GOOD: db.query("SELECT * FROM data WHERE user_id = ? AND tenant_id = ?",
#                user_id, tenant_id)

# 2. Cache key collision
# BAD: cache_key = f"user:{user_id}"
# GOOD: cache_key = f"tenant:{tenant_id}:user:{user_id}"

# 3. Vector search without filter
# BAD: vector_db.search(embedding, top_k=10)
# GOOD: vector_db.search(embedding, filter={"tenant_id": tid}, top_k=10)

Recovery

1. Fix the root cause (add missing tenant filter, fix cache keys, etc.)

2. Purge leaked data from caches, logs, and any intermediate storage where cross-tenant data may have been cached.

3. Run full isolation test suite before re-enabling the affected component.

4. Engage legal/compliance for breach notification requirements.

Incident Type 4: Runaway Agent / Cost Explosion

Signals: Cost monitoring alarm. Agent process consuming excessive CPU/memory. API usage spike. Agent stuck in retry loop.

Containment (Immediate)

1. Kill the runaway process:


# Terminate the specific agent task
agent-cli kill-task --task-id runaway-123

# Or kill the container
docker stop runaway-agent-container

# Or global kill switch if multiple agents affected
curl -X POST https://agent-api/kill-switch \
  -d '{"scope": "global", "reason": "cost explosion"}'

2. Set hard budget caps to prevent further spending:


# AWS billing alarm (if not already set)
aws budgets create-budget \
  --budget '{"BudgetName":"agent-emergency","BudgetLimit":{"Amount":"100","Unit":"USD"}}'

# API provider: reduce rate limit
openai_client.set_rate_limit(requests_per_minute=10)

3. Assess the damage:


-- What did the agent do during the runaway period?
SELECT tool, COUNT(*), SUM(estimated_cost)
FROM audit_log
WHERE task_id = 'runaway-123'
GROUP BY tool;