Model Security May 7, 2026 Microsoft Security Blog

When Prompts Become Shells: Two Critical Flaws Turn Microsoft Semantic Kernel Agents Into Remote Code Execution

By the AuthorityGate Architect Team

The Problem: A Sentence Typed Into a Chatbox Opened a Program on the Server

Imagine you hired a research assistant and gave them a strict rule: "You may only read documents from the filing cabinet and tell me what they say. You are never allowed to do anything else." That seems safe. The assistant is just reading and summarizing. But then imagine that someone slipped a document into your filing cabinet that contained a hidden instruction, and the moment your assistant read that document, it stopped summarizing and instead walked over to your computer, opened a terminal, and started running programs of the stranger's choosing. The assistant did this without alarm, without asking, and without any sense that it had done anything wrong. It was, after all, just "processing a document."

That is, in plain terms, what Microsoft's own security researchers demonstrated in May 2026 against Semantic Kernel, one of Microsoft's flagship frameworks for building AI agents. In a research post titled "When prompts become shells," published on the Microsoft Security Blog on May 7, 2026, the team showed that a single piece of attacker-supplied text, processed by an AI agent, could be escalated from a "content problem" into full control of the machine the agent was running on. They proved it the most unambiguous way possible: they made a vulnerable agent launch calc.exe, the Windows calculator, on the host computer, using nothing but a malicious prompt. No browser exploit. No malicious email attachment. No memory-corruption bug. Just words.

The calculator is harmless. The point is not the calculator. The point is that if an attacker can make the host launch the calculator, they can make it launch anything: ransomware, a data-exfiltration tool, a remote-access backdoor. The calculator is simply the universally understood proof that the attacker reached the level of "I can now run code of my choosing on your server." In security terms, that is called Remote Code Execution, or RCE, and it is the single most severe category of vulnerability there is. Microsoft assigned the two underlying flaws its highest severity rating: Critical.

What makes this incident so important for executives is not that one specific software library had a bug. Bugs get patched, and these were patched the same day they were disclosed. What matters is the pattern it reveals. As organizations rush to wire AI agents into their real systems, giving them tools, database access, and the ability to take actions, they are quietly building a new and poorly understood pathway: untrusted text flowing straight into trusted machinery. This article explains, in plain language, exactly how that happened here, why it will keep happening across the industry, and what your organization should be doing about it before an agent of yours is the one launching the calculator.

What Is an AI Agent Framework?

A regular AI chatbot is a closed conversation. You type a question, it generates an answer, and that is the end of it. It cannot reach outside the chat window. An AI agent is fundamentally different: it is a chatbot that has been given hands. It can search your company's documents, query a database, call an external service, write a file, or run a calculation, all on its own, in order to complete a task you gave it. The pieces of software that grant the AI these "hands", the database lookups, the file operations, the web calls, are called tools.

An AI agent framework is the toolkit developers use to build these agents. Semantic Kernel is exactly that: an open-source framework from Microsoft that lets developers connect a large language model to a set of tools, wrap it in a few lines of code, and ship a working agent. It is popular precisely because it makes a hard thing easy. A developer can stand up an agent that "answers questions about our product catalog by searching our internal knowledge base" in an afternoon. That convenience is also why a vulnerability in the framework matters so much: every agent built on top of it inherits the framework's assumptions, including its mistakes.

Think of the framework as the wiring and plumbing of a building. An individual developer building one agent is like a tenant decorating an apartment. If the building's wiring has a hidden fault, it does not matter how carefully each tenant decorates; the fault is in the shared infrastructure that every unit depends on. A flaw in Semantic Kernel is a flaw in the wiring, and it can light up in any apartment built on top of it.

What "Remote Code Execution" Really Means

Remote Code Execution is the worst phrase in a security report. It means that an attacker who is not physically at your computer, and who has no account, no password, and no authorized access, can nonetheless make your computer run instructions of their choosing. Once an attacker can run code on a machine, the game is largely over for that machine. They can read every file it can read, use every credential it has stored, reach every other system it can reach on the internal network, install permanent backdoors, encrypt the data and demand ransom, or quietly sit and watch. RCE is not "a vulnerability"; it is the master key that opens all the other doors.

This is why the security industry treats RCE as the top of the severity scale. Microsoft and the broader vulnerability-scoring system (CVSS) rate flaws from 0 to 10, and both of the Semantic Kernel flaws covered here were rated Critical, in the 9.8 to 10.0 range. To put that in perspective: a Critical, network-reachable RCE that requires no authentication is the kind of finding that triggers emergency, drop-everything patching cycles in large enterprises. It is the most dangerous classification a vulnerability can receive.

Why Prompt Injection Reaching Code Execution Is the Worst Case

Prompt injection is the term for tricking an AI by feeding it text that it mistakes for legitimate instructions. The AI cannot reliably tell the difference between "content it is supposed to read" and "commands it is supposed to obey," because to the model, both are just text arriving in the same stream. Most discussions of prompt injection treat it as a content problem: the AI says something it should not, recommends the wrong product, or leaks a snippet of information. Embarrassing, but survivable.

What Microsoft's researchers demonstrated is the nightmare upgrade of that scenario: prompt injection that does not stop at corrupting the AI's words, but reaches all the way down into the host computer and runs code. The text the attacker supplies stops being "a thing the AI says" and becomes "a thing the server does." This is the difference between a con artist who convinces your assistant to give a bad recommendation, and a con artist who convinces your assistant to hand over the keys to the building. The first is a credibility problem. The second is a breach.

The researchers summarized the lesson in a single, blunt sentence that every executive deploying agents should internalize: "Your LLM is not a security boundary. The tools you expose define your attacker's affected scope." In other words, do not assume the AI model will refuse to do something dangerous. Assume that anything the agent is technically capable of doing, an attacker can eventually make it do. The only real limit is the set of tools and permissions you handed the agent in the first place.

Prompt injection escalating into remote code execution on an AI agent host

When an AI agent is given real tools, attacker-supplied text can flow through those tools and reach the underlying host. The model itself does not act as a wall between untrusted input and trusted execution.

Why This Matters to You

If your organization is building or piloting AI agents, especially ones that search internal documents, look things up in a knowledge base, or call any tools, you are potentially exposed to exactly this class of flaw. Semantic Kernel is one of the most widely used frameworks for this, which means a single library mistake had the reach to affect a large number of in-development agent applications. The vulnerable behavior was in default functionality, not an exotic configuration, so developers could have been exposed without ever doing anything unusual.

The exposure does not require a sophisticated attacker. The malicious input can arrive inside an ordinary-looking document that your agent retrieves and reads as part of its normal job. A single poisoned record in a knowledge base the agent searches was enough to trigger code execution on the host. If you are relying on the AI model to "know better" than to act on a malicious instruction, this incident is the proof that you cannot.

What Happened: Two Critical Flaws, One Devastating Pattern

On May 7, 2026, Microsoft's Security Response Center published advisories for two Critical vulnerabilities in Semantic Kernel, accompanied by the research write-up "When prompts become shells: RCE vulnerabilities in AI agent frameworks." Both flaws share the same fundamental story: attacker-controlled text, processed by the agent, ends up driving a real action on the host. They differ in the exact mechanism, and together they illustrate two distinct ways the same mistake shows up. Microsoft patched both on the same day the advisories were published.

How attacker-controlled input flows through an AI agent's tools into host code execution

The two flaws attack the same seam from different angles: one through a search filter that gets evaluated as live code, the other through a file-write tool that should never have been exposed to the model. Both turn the agent's own tools into the attacker's weapons.

1

CVE-2026-26030: The Search Filter That Ran the Attacker's Code

The everyday analogy: Imagine a librarian who, to find your book, does not just look it up in a card catalog but instead reads your request out loud as a set of commands and does whatever it says. If you ask for "books about Paris," fine. But if a mischievous request is worded as "books about Paris, and also unlock the back door and let me in," the librarian dutifully unlocks the back door, because they were never taught the difference between a search term and a command.

This flaw lived in the Python version of Semantic Kernel, in versions before 1.39.4, in a component called the InMemoryVectorStore. A vector store is the part of an AI system that holds documents the agent can search through, the agent's searchable memory of, say, your product catalog or knowledge base. When the agent wants to narrow a search ("only show me documents about the city of Paris"), the framework builds a filter to apply. The fatal design choice was how that filter was built and run: it was assembled as a small piece of Python code, a lambda, and then executed using Python's eval() function, which takes a string of text and runs it as live program code.

The problem is that part of that filter string came from values the AI model controlled, and those values were not sanitized before being placed into code that gets executed. As the researchers put it: "The vulnerability is that kwargs[param.name] is AI model-controlled and not sanitized. This acts as a classic injection sink." A "sink" is a security term for a spot where untrusted input flows into something dangerous, here, into a function that runs code. Because the AI's output can in turn be steered by attacker-supplied text in a retrieved document, the chain is complete: poisoned document influences the model, the model's output gets baked into a filter, the filter is run as code.

A normal filter looked like lambda x: x.city == 'Paris'. An attacker crafted input shaped like ' or MALICIOUS_CODE or ', which slots into the filter string and transforms a harmless comparison into executable malicious code. The researchers noted that the payload could exploit Python's type system to climb through the language's internal class hierarchy and reach the os module, the part of Python that talks to the operating system, without using any of the obviously blocked keywords like a direct import. That is what let the attack reach all the way down to launching a program on the host.

Impact: Critical (CVSS reported in the 9.8 range). A single retrieved document containing a crafted string was enough to run arbitrary code on the machine hosting the agent. Affected: Python semantic-kernel before 1.39.4. Patched in 1.39.4.

2

CVE-2026-25592: The Helper Function That Was Accidentally Handed to the AI

The everyday analogy: Imagine a company gives its new intern a ring of keys for the supply closet, but by mistake the building's master key is on the same ring. The intern was only ever supposed to fetch staplers. Now anyone who can persuade the intern to "just grab that thing from the back" can, through the intern, reach rooms they were never meant to touch, because the intern is unknowingly carrying a key to everything.

This second flaw was in the .NET version of Semantic Kernel, in versions before 1.71.0, specifically in the SessionsPythonPlugin. This plugin runs AI-generated Python code inside an isolated, sandboxed container, an Azure Container Apps session, precisely so that if the code does something bad, the damage is contained inside the box and cannot reach the real host. That isolation is the safety design. The flaw broke it.

The mistake was that an internal helper function called DownloadFileAsync was tagged with [KernelFunction]. That tag is what tells Semantic Kernel "expose this capability to the AI model as a tool the agent is allowed to call." It appears the function was never intended to be one of the agent's tools, but the tag made it one anyway. Worse, the destination path for the download, the localFilePath parameter, was entirely AI-controlled and had no validation, meaning the model could be steered into writing a file to any location it chose.

The researchers documented a clean three-step sandbox escape. First, create a malicious payload inside the isolated container (allowed, because that is what the sandbox is for). Second, abuse the exposed download function to write that payload out of the box and onto the host's Windows Startup folder, the special folder whose contents run automatically every time a user logs in. Third, wait for the next login, at which point the payload executes on the host with the user's privileges. The sandbox that was supposed to be the safety boundary was bypassed entirely, because a single mis-tagged function gave the model a way to write to the host.

Impact: Critical (CVSS reported in the 9.9-10.0 range). A prompt-injected agent could escape its sandbox and achieve full host compromise via an arbitrary file write to the Startup folder. Affected: .NET semantic-kernel before 1.71.0. Patched in 1.71.0.

How It Works: What the Prompt Looks Like vs. What the Host Executes

The unsettling elegance of these attacks is the gap between how trivial they look on the surface and how catastrophic they are underneath. On one side, the attacker provides what appears to be an ordinary piece of text, a search term, a phrase inside a document, the kind of thing an agent reads thousands of times a day without incident. On the other side, that text crosses an invisible line and becomes an instruction the computer carries out. The agent never announces that anything unusual happened. From the operator's chair, the agent looks like it is doing its job.

The root cause in the first flaw is a programming sin that predates AI by decades: taking text and running it as code. Developers have known for thirty years that you should never feed untrusted input into functions like eval(). The novelty here is the delivery route. In the past, the untrusted input came from a web form or a URL. Now it arrives through the AI model, dressed up as the model's own helpful output, which makes it feel trustworthy when it is anything but. The lesson, stated by the researchers, bears repeating: never run code built from input the model can influence.

The diagram below makes the gap concrete. On the left is what the attacker supplies and what a casual observer would see. On the right is what the host machine actually ends up doing once that input flows through the vulnerable filter. The two columns describe the same event, but they live in completely different worlds of consequence.

Same Input, Two Very Different Outcomes

What the Prompt Looks Like

A search filter, or a phrase inside a retrieved document, that appears to be ordinary data:

city == 'Paris'

Looks like a harmless search term. The agent is "just filtering documents." Nothing in the operator's view suggests danger.

What the Host Executes

The attacker's crafted value breaks out of the filter and becomes live code:

' or [reach the os module and run calc.exe] or '

Run through eval(), the string is executed as a program. It traverses Python's type system to reach the OS and launches a process on the host. Proven with calc.exe; could be anything.

The agent cannot distinguish "a search term" from "a command," because the framework converts the model-influenced value into code and runs it. There is no malware, no exploit chain, and no alert. The host simply does what the text told it to do.

By The Numbers

9.9

Critical CVSS Severity (Top of the Scale)

2

Critical RCE CVEs Disclosed

1

Poisoned Document Needed to Trigger RCE

0

Days Between Disclosure and Patch

CVSS scores reported by Microsoft and independent trackers place both flaws in the Critical band (approximately 9.8 to 10.0). "9.9" is used here as a representative headline figure for the pair; consult the official CVE advisories for the exact per-CVE base scores.

Financial Impact

A single poisoned document can run arbitrary code on the agent host, exposing every file, credential, and internal system the agent can reach, enabling ransomware, data exfiltration, and persistent backdoors, with no malware, exploit chain, or security alert.

Risk Severity Analysis

The two disclosed flaws are only the visible surface of a broader risk class. The table below maps the specific risks raised by this incident to their severity and the business exposure each one creates for an organization deploying agents built on frameworks like Semantic Kernel.

Risk Severity Business Exposure
Prompt-injection to host RCE (eval on model input) Critical A single poisoned document or search term runs arbitrary code on the host. The attacker inherits everything the agent process can reach: files, credentials, internal networks.
Sandbox escape via over-exposed tool Critical An accidentally exposed file-write function lets the model break out of an isolation boundary that was the entire safety design, achieving persistence on the host via the Startup folder.
Over-broad tool registry / excessive agent permissions Critical Every tool exposed to the model expands the attacker's reach. "The tools you expose define your attacker's affected scope." Most teams have no inventory of what their agents can actually do.
Unpatched SDK / dependency drift High Patches shipped same-day, but they only protect organizations that actually update. Agents pinned to old framework versions remain exploitable indefinitely.
No egress control on agent hosts High Once code runs on the host, the absence of outbound network restrictions lets the attacker exfiltrate data and pull down additional payloads unchecked.
Treating the LLM as a security boundary High Architectures that assume the model will refuse dangerous actions have no real boundary at all. The model cannot be trusted to police the inputs flowing through it.

Why This Keeps Happening: Old Sins Meet New Speed

The first reason this keeps happening is the oldest mistake in software: running text as code. The use of eval() on input that an attacker can influence has been a top-tier vulnerability for as long as the industry has tracked vulnerabilities. Developers are explicitly taught not to do it. Yet it reappears here, because the AI model laundered the input. When a value comes from "the model," it feels like it came from your own trusted system rather than from a stranger's document. That false sense of trust is exactly how decades-old mistakes sneak back in through a new door. The model is not the author of that text; it is a conduit for whatever text it was fed.

The second reason is the one Microsoft's researchers put front and center: the tool registry is the real attack surface. Every capability a developer registers as a tool, every function tagged so the agent can call it, is a door the attacker can try to walk through. The second flaw existed purely because a function that should never have been a tool was accidentally marked as one. In a hand-built application, exposing a dangerous function takes deliberate effort. In an agent framework, exposing it can be as casual as leaving one annotation on one helper. The convenience that makes frameworks productive is the same convenience that makes over-exposure effortless and invisible.

The third reason is speed. Agent frameworks are being adopted at a pace that far outstrips the maturing of security practices around them. Teams that would never deploy a public-facing web application without a security review are spinning up agents, wiring them to internal data and tools, and pushing them into pilots in days. The frameworks are new, the threat models are unfamiliar, and the people building these agents are often application developers, not security engineers. The result is that fundamentally dangerous patterns, untrusted input reaching execution, get baked into production before anyone with a security lens has looked at the design.

The encouraging part of this particular story is that Microsoft found and fixed these flaws responsibly, and shipped patches the same day. That is the model working as intended. But the organizations that built agents on the vulnerable versions are only protected if they actually update, and many will not, because they do not even have an inventory of which agents they are running or which framework versions those agents depend on. The vulnerability gets patched in a day; the patch reaching every deployed agent can take months, or never happen at all.

What You Can Do: Six Practical Steps to Contain Agent RCE Risk

The good news mirrors the bad news: because these attacks exploit architecture rather than exotic technology, the defenses are also architectural. None of them require buying a new product. They require treating your AI agents as untrusted systems that process untrusted input, and applying the security fundamentals you already know to this new context. Here are six concrete measures, in priority order.

Defense-in-depth architecture for AI agent frameworks: sandboxing, least privilege, and egress control

Effective defense assumes the model will be compromised and limits the blast radius: never run model-influenced input as code, expose the fewest tools possible, sandbox execution, and restrict what a compromised host can reach.

1

Never run code built from model-influenced input

This is the single most important rule, and it is the one the first flaw violated. Any place where your application takes a value the model produced, or that the model could have been steered to produce by a document it read, and turns that value into something that gets executed, is a critical vulnerability waiting to happen. The functions eval(), exec(), and their equivalents in every language must never receive model-derived strings.

The fix is structural: filters, queries, and commands should be built from parameterized, validated components, never by string-concatenating model output into code. Treat model output exactly as you would treat input from an anonymous user on the internet, because functionally that is what it is. Audit your agent code, and your framework's defaults, for any path where model-controlled values flow into a code-execution or query-construction sink.

2

Treat the tool registry as your real attack surface and minimize it

The researchers' core lesson, "the tools you expose define your attacker's affected scope," should become a design principle in your organization. Maintain an explicit, reviewed inventory of every tool each agent can call. For each one, ask: does the agent genuinely need this capability to do its job? If not, remove it. The second flaw existed only because a function was exposed that never should have been.

Make tool exposure a deliberate, audited act rather than a side effect of an annotation. Require security review before any function that writes files, makes network calls, runs commands, or touches sensitive data is registered as an agent-callable tool. Scan your codebase for the framework's "expose this as a tool" markers (such as [KernelFunction]) and verify every single one was intentional.

3

Sandbox agent execution, and assume the sandbox can be escaped

Run agents and any code they generate inside genuinely isolated environments, separate containers or virtual machines with no standing access to your sensitive systems, so that even successful code execution is contained. The second flaw is a reminder that sandboxing is necessary but not sufficient: the isolation was real, but a single over-exposed tool let the model write across the boundary.

Harden the sandbox accordingly. The container running an agent should have no write access to host directories that auto-execute (such as Startup folders), no host filesystem mounts it does not strictly need, and no path back to the host other than the narrow, validated channel you explicitly designed. Validate every parameter, especially file paths and destinations, that crosses an isolation boundary. Design as if the sandbox will be breached, and limit what a breach can reach.

4

Patch your AI SDKs fast, and know which versions you run

Microsoft shipped the fixes the same day it disclosed the flaws: semantic-kernel 1.39.4 for Python and 1.71.0 for .NET. A same-day patch only protects you if you apply it. The first step is knowing what you are running. Maintain a software bill of materials for every AI agent: which framework, which exact version, which plugins. Without that inventory, you cannot answer the most basic question after a disclosure, "are we affected?", and you cannot prioritize the fix.

Treat AI framework dependencies as high-priority for patching, on par with your operating systems and web servers, not as an afterthought. Subscribe to the security advisories for every AI framework you use. Build automated dependency scanning into your pipelines so that a known-vulnerable version of an agent framework fails the build rather than reaching production. The window between disclosure and exploitation is shrinking; your patch window must shrink with it.

5

Run agents under least privilege

When an attacker achieves code execution, they inherit exactly the permissions of the process they compromised, no more, no less. An agent running with broad administrative rights hands the attacker the keys to everything. An agent running as a tightly scoped, low-privilege identity hands them very little. The blast radius of a successful RCE is determined almost entirely by how much power the agent had in the first place.

Give each agent its own dedicated, minimal identity with access only to the specific data and systems it needs for its specific task, and nothing else. Never run agents as administrators or with shared, broadly privileged service accounts. Apply the same scrutiny to an agent's permissions that you would apply to a new external contractor: grant the least access that lets the job get done, and review it regularly. This single discipline turns many would-be catastrophes into contained, survivable incidents.

6

Control egress so a compromised host cannot phone home

Code execution on a host is dangerous in large part because of what the attacker can do next: send your data out, and pull their tools in. Both of those require the compromised host to make outbound network connections. If you tightly restrict where an agent host is allowed to connect, an attacker who achieves RCE finds themselves on a machine that cannot reach their servers, dramatically limiting the damage.

Place agent hosts behind strict egress controls: an explicit allow-list of the handful of destinations the agent legitimately needs, with everything else blocked by default. Monitor outbound traffic for anomalies, a sudden connection to an unknown external address right after the agent processed a document is a red flag. Combined with least privilege, egress control is what shrinks the window between "the attacker ran code" and "the attacker got nothing useful out of it."

Governance Checklist

Does your AI agent deployment include these critical controls?

A hard rule that no model-influenced input ever reaches eval(), exec(), or query construction
A reviewed inventory of every tool each agent is allowed to call, with justification
Genuine sandbox isolation for agent code execution, hardened against escape
A software bill of materials tracking the exact framework version of every agent
Egress controls and outbound traffic monitoring on every agent host
Least-privilege identity for each agent, scoped to its specific task
Security review before any new tool is exposed to an agent
Automated dependency scanning that fails the build on known-vulnerable SDK versions

Most organizations currently lack the controls marked with ✗. Implementing even two or three of these controls significantly reduces exposure to agent-based code execution.

AuthorityGate Governance Framework

AuthorityGate's 8-gate model is built for exactly this failure mode. Gate 1 (Pre-Validation) flags dangerous patterns such as model-influenced input reaching a code-execution sink before an agent ships. Gate 4 (Security Scan) inventories the tool registry and catches over-exposed functions and known-vulnerable SDK versions. Gate 6 (Operational Resilience) enforces sandboxing, least privilege, and egress control so a successful injection is contained. Gate 7 (SME Approval) requires human security sign-off before any high-risk tool is exposed to an agent. Gate 8 (Recovery Plan) ensures a compromised agent can be isolated and rolled back.

The framework treats the AI model as a conduit for untrusted input, not as a security boundary, applying the same governance rigor to agent tools and permissions that organizations already apply to external API integrations and third-party access.

The Bottom Line

Microsoft's "When prompts become shells" research is a watershed because it ends the comfortable assumption that prompt injection is merely a content problem. Two Critical flaws in a mainstream agent framework showed that a single sentence of attacker-supplied text can become a program running on your server. The proof was a calculator; the implication is ransomware, data theft, and persistent backdoors. The model did not refuse, because the model was never the boundary in the first place.

The two specific bugs are patched. The pattern is not. As long as organizations wire AI agents to real tools without treating model output as untrusted input, without inventorying and minimizing the tools they expose, without sandboxing, least privilege, and egress control, this same story will recur under different CVE numbers. The defenses are not exotic. They are the security fundamentals you already know, applied honestly to a technology that feels too helpful to be dangerous.

The organizations that internalize one sentence, "your LLM is not a security boundary; the tools you expose define your attacker's affected scope", and act on it now will be the ones that capture the genuine value of AI agents without handing attackers a shell on their servers. The ones that keep trusting the model to know better are building on a foundation that a single poisoned document can collapse.

This article is part of our incident analysis newsletter series. Subscribe to receive complete analyses with risk matrices, governance checklists, and actionable recommendations.

Share this article