Two AI Agent Frameworks, Five CVEs, One Week: When the Bug Classes Are Twenty Years Old and the Frameworks Are Still Shipping Them

Written by the Rafter Team

Microsoft Semantic Kernel disclosed two high-severity CVEs on May 7. Spring AI disclosed three on May 8. Five high-severity findings, in two flagship AI agent frameworks, in a single week. Map the CVEs to bug classes and the diagnosis is uncomfortable: eval() on attacker input, SQL-style filter injection, insecure-by-default cross-user data sharing, arbitrary file write through a tool exposed to the model, and stored injection rendered into a trusted consumer.
Each has been on the OWASP Top 10 for between ten and twenty years.
The Microsoft post says the lesson out loud in a single sentence: Your LLM is not a security boundary. The tools you expose define your attacker's affected scope. It is the most important line written about AI agent security this year, and it belongs in every agent-design review.
Upgrade today if you use either framework. Semantic Kernel to 1.39.4 (Python) or 1.71.0 (.NET). Spring AI to 1.0.7 or 1.1.6. The Spring AI patch is a breaking change by design — applications that don't explicitly pass a ChatMemory.CONVERSATION_ID will throw an exception on upgrade. Treat that as the framework helping you, not as a regression.
Microsoft Semantic Kernel — May 7
Microsoft published the disclosure on its own security blog under the title "When prompts become shells: RCE vulnerabilities in AI agent frameworks." Two CVEs.
CVE-2026-26030 — RCE via Search Plugin
Semantic Kernel's Search Plugin uses an In-Memory Vector Store with a default filter implemented as a Python lambda — and the filter calls eval() on parameters the AI model can control.
The attack chain:
- Prompt injection vector. A malicious prompt directs the agent to invoke a tool like
search_hotels()with crafted input. - Template-string break. Input like
' or MALICIOUS_CODE or 'breaks the lambda's template string, letting the attacker append arbitrary Python. - Type-system traversal. The payload reaches
tuple(), then.__class__, then.__subclasses__(), walks toBuiltinImporter, dynamically loads theosmodule, and executes shell commands likecalc.exe(Microsoft's published PoC) — or anything else.
This is eval() on attacker input. CWE-95 (improper neutralization of directives in dynamically evaluated code). The bug class compilers and code scanners have warned about for two decades.
The attacker didn't break the LLM. The attacker exploited the framework's decision to interpolate untrusted input into an eval() call inside a default filter function that the agent could invoke through normal tool use. The LLM was the delivery mechanism, not the failure point.
CVE-2026-25592 — Sandbox-to-host file write via SessionsPythonPlugin
The SessionsPythonPlugin exposes a DownloadFileAsync function to the AI model. Microsoft's writeup describes the chain:
- Injected prompt tells the agent to use
ExecuteCodeto generate a malicious script inside the container. - Second injection instructs the model to call
DownloadFileAsync, writing the sandbox payload to the host'sWindows\Start Menu\Programs\Startupfolder. - On the next user login, the script executes with full host privileges.
The vulnerability is not in the sandbox itself. It is in the tool the framework exposed to the model. The sandbox was assumed to be a security boundary; the tool that could write outside the sandbox was assumed to be safe to expose. Both assumptions were wrong, in opposite directions.
Microsoft's own framing
Three pull-quotes from the disclosure post, each worth holding onto.
"The AI model itself isn't the issue as it's behaving exactly as designed by parsing language into tool schemas. The vulnerability lies in how the framework and tools trust the parsed data."
"The overarching lesson from both vulnerabilities is that both aren't bugs in the AI model itself, but rather issues in agent architecture and tool design."
"Your LLM is not a security boundary. The tools you expose define your attacker's affected scope."
The last line is the keeper.
Affected and patched
| Variant | Vulnerable | Patched |
|---|---|---|
Python semantic-kernel | < 1.39.4 | ≥ 1.39.4 |
| .NET Semantic Kernel | < 1.71.0 | ≥ 1.71.0 |
Microsoft has also signaled this is the first post in a research series — "Stay tuned for upcoming blogs where we'll dive into similar vulnerabilities found in frameworks beyond the Microsoft ecosystem." LangChain and CrewAI are mentioned by name as the kind of frameworks the series will examine.
Spring AI — May 8
The Spring team disclosed three HIGH-severity CVEs on the same day, each a different bug class. Spring AI 1.0.7 and 1.1.6 patch all three.
CVE-2026-41712 — Cross-user chat memory leak
The worst of the three. Spring AI's chat memory used an implicit DEFAULT_CONVERSATION_ID constant. If a developer didn't explicitly override it on each ChatClient call, all chat memory landed in the same conversation bucket — across users.
// Vulnerable (pre-1.0.7):
chatClient.prompt(userMessage)
.advisors(MessageChatMemoryAdvisor.builder(chatMemory).build())
.call();
// Patched (post-upgrade):
chatClient.prompt(userMessage)
.advisors(a -> a.param(ChatMemory.CONVERSATION_ID, userId))
.call();
The patched code throws an exception if the application does not explicitly pass a conversation ID. That is, by design, a breaking change. The Spring team is forcing developers to set the boundary themselves rather than continuing to ship an insecure default.
CVSS vector: AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N — network-accessible, no auth needed, no user interaction, high confidentiality impact. CVSS 7.5. Pure cross-user data leak.
This is CWE-668: exposure of resource to wrong sphere. The bug class is older than the language Spring AI is written in.
Credit to Ahmed Sekka and sharlongwen for responsible disclosure.
CVE-2026-41705 — MilvusVectorStore filter injection
MilvusVectorStore#doDelete(List) builds a Milvus filter expression by concatenating user-supplied document IDs without sanitization. An attacker who controls document IDs can inject filter operators and delete records beyond their authorized scope.
CVSS vector: AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:L/A:L. Network-accessible, no auth, with confidentiality, integrity, and availability impacts.
The mechanism is CWE-89 in spirit — query injection into a data store through unsanitized input. The data store happens to be a vector database rather than a SQL database, but the bug class is identical. "Sanitize input before it reaches a query parser" has been correct advice for thirty years; the parser changed, the advice didn't.
Credit to sharlongwen for disclosure.
CVE-2026-41713 — Stored prompt injection via conversation memory
Malicious input written into Spring AI's conversation memory persists and is later interpreted by the model in unintended ways. This is the AI-era version of stored XSS: an attacker writes input that gets reflected back into a trusted consumer, the consumer renders or interprets it without validation, and the attacker's intent executes inside the consumer's trust boundary.
The only difference between CVE-2026-41713 and CWE-79 (stored XSS) is that the consumer is an LLM rather than a browser. Same input. Same lifecycle. Same defense.
What this is, in pattern terms
Read the five CVEs as one composite picture.
eval()on attacker input. CWE-95.- Sandbox-to-host file write via over-exposed tool. CWE-20-adjacent (improper input validation), CWE-22-adjacent in spirit.
- Cross-user data leak via insecure default. CWE-668.
- Filter-expression injection. CWE-89 in spirit.
- Stored prompt injection. CWE-79 in spirit.
Five bug classes. All recognizable from any pre-AI security textbook. Shipping in May 2026, in reference implementations from Microsoft and the Spring ecosystem.
The reason this keeps happening is that AI agent frameworks are being built at velocity by teams whose threat model is "prompt injection," and prompt injection is one CWE among many. The frameworks treat the LLM as the boundary — "if we can stop the LLM from being tricked, we're safe" — and the bugs are in the layer underneath the LLM, in the tools, plugins, defaults, and tool-schema parsers that the model talks to.
Microsoft said it cleanly: Your LLM is not a security boundary. The tools you expose define your attacker's affected scope.
The same shape recurs in every recent AI-security disclosure:
- The OpenAI Codex command-injection bug from March was CWE-78 in a flagship coding agent.
- The OX Security MCP STDIO design flaw from April was a config-becomes-command default in Anthropic's reference SDKs.
- The Meta autonomous-agent Sev 1 in March was over-scoped tool authorization triggering internal data exposure.
- The PyTorch Lightning Mini Shai-Hulud compromise on April 30 was import-time execution, an old npm primitive applied to a Python AI training library.
Every one of them traces back to a bug class older than the AI product surface that surfaced it.
What to do
If you use Semantic Kernel
- Upgrade today.
semantic-kernel >= 1.39.4for Python,>= 1.71.0for .NET. - Audit plugin exposure. The CVE-2026-25592 chain works because a tool that could write to the host filesystem was reachable from inside a sandbox that was assumed to contain the agent. Inventory which plugins your agent can invoke, and what each tool inside those plugins can do.
- Remove high-power tools.
DownloadFileAsync,ExecuteCode, file-system writers, network egress functions — none of these should be exposed to an LLM-controlled tool-use loop without per-call human confirmation. The architectural finding behind CVE-2026-25592 is "the LLM is allowed to invoke a tool that writes to the host." The answer is to not allow that.
If you use Spring AI
- Upgrade to 1.0.7 or 1.1.6.
- Explicitly pass
ChatMemory.CONVERSATION_IDto every memory advisor in everyChatClientcall. The new code throws an exception if you don't; treat that as the framework helping you, not as a regression. - Sanitize vector-store input. Document IDs flowing into
MilvusVectorStore(or any vector store with a filter-expression API) should be validated before they reach a delete or filter expression. Treat document IDs as untrusted input, just like SQL queries.
If you build an AI agent framework or product
- Re-read the Microsoft post quote. "The tools you expose define your attacker's affected scope." Use it as a design-review question. Every tool the agent can invoke is a permission boundary.
- Map each tool to a CWE before you ship it. If your framework exposes a tool that interpolates input into
eval(), you have CWE-95. If your framework exposes a tool that writes to the host filesystem from inside a sandbox, you have a sandbox escape. If your framework persists user input that flows back into the model, you have CWE-79 in spirit. - Inventory defaults. The Spring AI
DEFAULT_CONVERSATION_IDbug is a default that wasn't, in fact, safe. Every implicit default in your framework is a contract with the developer about what the framework will do when they don't override it. If the contract leaks data, the default is broken.
How Rafter helps
Rafter's Code Analysis Engine looks for command-injection patterns, eval()-on-input patterns, missing-auth-on-destructive-tool patterns, and unsanitized-flow-into-query patterns on every push. The diff that introduces an unsafe tool exposure, an unsanitized vector-store filter, an insecure default in a stateful component, or an eval() call against attacker-reachable input is exactly where the warning is most useful — before the framework reaches version 1.0 and the bug class is shipped to every team that depends on it.
What scanning does not do is rewrite a framework's architectural decisions. "Don't expose DownloadFileAsync to the model" is a design choice; the scanner can help once the choice has been encoded as a rule. The defenses that work are split between code (which scanners can address) and architecture (which they can't).
Closing on the bug-class clock
The newest part of AI agent security is the agent. The bugs are old.
eval() on attacker input is older than Spring. Cross-user data leak via shared default is older than the JVM. Stored injection rendered into a trusted consumer is older than the consumer.
What is genuinely new is the rate at which AI frameworks ship without re-validating the older threat models, and the rate at which the disclosures are now landing — five HIGH-severity CVEs across two flagship frameworks in a single week. Microsoft's research series, which it has telegraphed will extend to LangChain, CrewAI, and other frameworks, will surface more.
The defensive answer is the same answer that has worked on every other framework category for two decades. Treat untrusted input as untrusted. Don't eval() it. Don't share state by default. Sanitize on the way in and on the way out. Map every tool to a CWE before you ship it. Scan the framework code on every push.
The LLM is not the security boundary. The bugs already aren't in the LLM.
Further reading
- The MCP Protocol Has a Design Flaw, and Anthropic Says That Is Expected — the protocol-level analog of the same thesis: unsafe defaults in the layer underneath the model.
- A Branch Name as RCE: OpenAI Codex and the GitHub Token It Held — CWE-78 in a flagship coding agent, exact same shape.
- Confidently Wrong, Autonomously Enacted: Meta's Sev 1 Is the Future of AI-Agent Incidents — the over-scoped-tool-authorization version of the same architectural failure.
- PyTorch Lightning, Mini Shai-Hulud, and Malware That Signs Commits as Claude Code — old primitive (install-time execution), new AI-product surface.
Sources
- Microsoft Security Blog — When prompts become shells: RCE vulnerabilities in AI agent frameworks: https://www.microsoft.com/en-us/security/blog/2026/05/07/prompts-become-shells-rce-vulnerabilities-ai-agent-frameworks/
- Spring Security Advisory — CVE-2026-41712 (chat memory cross-user leak): https://spring.io/security/cve-2026-41712/
- Spring Security Advisory — CVE-2026-41705 (MilvusVectorStore filter injection): https://spring.io/security/cve-2026-41705/
- Spring AI 1.0.7, 1.1.6, 2.0.0-M6 release announcement: https://spring.io/blog/2026/05/08/spring-ai-1-0-7-1-1-6-2-0-0-M6-available-now/
- AI security beyond prompt injection
- A year of AI developer-tool supply-chain attacks