AI Incident Response: What to Do When You Are Jailbroken

Written by Rafter Team
January 27, 2026

You're watching logs late one night when you notice something strange: your production chatbot has started printing out your internal system prompt and database schema in plain text. Someone dropped a malicious jailbreak prompt into the chat — and your model followed it.
Your keys are exposed. Logs are a mess. There's no playbook. Panic sets in.
When — not if — your AI app is compromised, what's your first move?
AI security incidents are happening every week in production systems. Unlike traditional web exploits, these incidents often emerge through natural language interfaces and can be hard to detect. Most teams are not prepared.
Introduction
As AI apps become mainstream, security incidents are shifting. Jailbreaks, prompt injections, data exfiltration, and key abuse are happening in production systems every week — and most teams are not prepared.
According to OWASP's 2026 LLM Top 10, over 60% of AI apps tested were vulnerable to prompt injection.
Unlike classic web exploits, these incidents often emerge through natural language interfaces, can be hard to detect, and may involve models acting as unwitting accomplices in their own compromise.
In this post, we'll cover a step-by-step incident response playbook for AI apps. You'll learn how to:
- Detect and confirm AI jailbreaks or prompt injection incidents
- Rotate keys and quarantine models to stop the bleeding
- Log, analyze, and preserve evidence for later investigation
- Run a structured post-mortem so it doesn't happen again
Step 1 — Detect and Contain the Incident Quickly
The first step in AI incident response is recognition. Jailbreaks and prompt injections often leave telltale signs:
- Unexpected outputs (e.g., system prompts, API keys, vector DB content)
- Guardrails suddenly bypassed
- Weird or malformed responses from the model
- Sudden spikes in token usage or external API calls
- Suspicious entries in logs (e.g., long "prompt chains")
Containment Actions
Once detected, speed matters:
- Disable or throttle the affected endpoints temporarily
- Block or throttle offending user sessions / IPs if identifiable
- Alert the security or on-call team immediately
This is about stopping further damage, not fully understanding it yet.
Takeaway: Contain first, investigate second. Speed matters more than perfect understanding in the initial response phase.
Step 2 — Rotate Keys and Secrets Immediately
Jailbreak incidents often involve prompt exfiltration of secrets. Attackers know that many AI apps store keys in prompts or accessible contexts.
Rotate immediately:
- LLM provider API keys (OpenAI, Anthropic, Mistral, etc.)
- Vector DB keys (Pinecone, Weaviate, pgvector, etc.)
- Any downstream service credentials exposed to the model (e.g., plugins, internal APIs)
Automate rotation where possible — e.g., via CI/CD pipelines and environment variables.
Example: Rotating an OpenAI key
# Revoke old key in provider dashboard first, then:
export OPENAI_API_KEY="new-key"
vercel deploy --prod
Don't forget preview environments and dev keys — attackers often pivot there next after initial exfiltration.
For more comprehensive guidance, see our API key management best practices.
Step 3 — Shut Down or Quarantine Affected Models
Sometimes the safest move is to temporarily disable affected models, endpoints, or pipelines.
This can:
- Prevent cascading leaks
- Stop attackers from chaining prompts to gain more control
- Buy you time to investigate properly
Techniques
- Use feature flags or config toggles to disable endpoints quickly
- Switch to a "maintenance mode" response temporarily
{
"message": "We're performing maintenance due to unusual activity. Please check back shortly."
}
Plan these toggles ahead of time so you're not redeploying in a panic. Build incident response into your architecture, not just your processes.
Step 4 — Log, Analyze, and Preserve Evidence
Good logging turns a crisis into a solvable problem.
Log the Right Things
- Prompt logs (input and output)
- Retrieval queries (if using RAG)
- API usage data (tokens, calls, anomalies)
- Error traces and stack logs
Identify Entry Points
- Which prompt triggered the jailbreak?
- Which session/user initiated it?
- Did it involve indirect prompt injection through RAG?
Preserve Evidence
Save the relevant:
- Request/response traces
- Vector DB queries
- Model outputs before mitigation
Make sure logs themselves don't leak additional sensitive data during analysis. Sanitize before storing incident logs.
For comprehensive pipeline security, see our guide on securing AI pipelines end-to-end.
Step 5 — Report, Patch, and Post-Mortem
Report
- Internal security teams — so they can coordinate broader response
- Possibly affected users — transparency builds trust
- Providers (OpenAI, Anthropic, etc.) if the incident exposed model vulnerabilities
Patch
Apply prompt hardening (e.g., better segmentation, contextual isolation):
- Implement sandboxing for high-risk operations
- Tighten access controls (e.g., minimize secret exposure in system prompts)
- Update jailbreak filters or retrieval sanitizers
Post-Mortem
Like any other security incident, document what happened:
Suggested Template:
- Summary — high-level overview
- Timeline — detection → containment → resolution
- Detection & Containment — how it was found and stopped
- Root Cause — why it happened
- Mitigations & Improvements — what will prevent recurrence
Treat AI incidents like any other security breach — with structured documentation and learning. The principles haven't changed, just the attack vectors.
Conclusion
AI apps face new kinds of incidents, but the principles of good security response haven't changed:
- Detect early — watch for unusual model outputs and behavior patterns
- Contain fast — disable endpoints and rotate keys immediately
- Rotate and quarantine — treat exposed models as compromised
- Log and preserve evidence — capture everything for analysis
- Report, patch, and learn — document and improve
The difference is where and how these incidents unfold — through language interfaces, model contexts, and RAG pipelines.
Next steps:
- Start by drafting a one-page incident response checklist your team can use tomorrow
- Run tabletop drills for jailbreak or prompt injection scenarios
- Integrate incident detection and key rotation into your CI pipelines before you need them
Preparation turns panic into process.
Related Resources
- AI Agent Incident Response Playbook
- Prompt Injection 101: How Attackers Hijack Your LLM
- Real-World AI Jailbreaks: How Innocent Prompts Become Exploits
- Silent Exfiltration: How Secrets Leak Through Model Output
- AI Builder Security: 7 New Attack Surfaces You Need to Know
- API Keys Explained: Secure Usage for Developers
- Securing AI-Generated Code: Best Practices
- Security Tool Comparisons: Choosing the Right Scanner