The GitHub Issue Is Now an Attack Surface: Hidden-Prompt Bugs in AI-Assisted Repos

When a developer files a bug, they write text. When an AI coding agent reads that bug to triage it, the same text becomes input to a model that has tools — file system access, shell, repo write, network. The trust boundary between "user-reported issue" and "agent prompt" is now a single function call away. The CamoLeak disclosure made this concrete. Most AI-assisted developer workflows still treat issues as inert content.

If your team uses any AI coding agent (Copilot, Cursor, Claude Code, Codex, Antigravity) to triage, summarize, or act on issues from a public repository, you have already exposed your agent to attacker-controlled text. Treat every issue body, PR description, and review comment as untrusted prompt input. Strip or sandbox the content before it reaches a tool-using model.

The shape of the bug

The class is simple. A repository accepts text from anyone with a GitHub account. An AI agent — installed by the maintainer or invoked by a downstream developer — eventually reads that text. The agent has tools. The text contains an instruction the agent will follow. The instruction does something the maintainer would not have approved.

The CamoLeak case (covered here) was the first widely-disclosed example of the shape working at scale: invisible Unicode in a GitHub-rendered field, an agent reading the rendered content, the agent following the hidden instruction, exfiltration via a side channel the maintainer had not modeled. Subsequent disclosures show the same shape with different payloads — including Codex branch-injection, where the attacker controlled a branch name rather than an issue body, and the sandworm-mode MCP worm, where the attacker controlled a tool's response.

These look like different bugs because the input vector is different. They are the same bug. The model is trusting attacker-controlled text and then taking action with tools.

Why this is hard to fix

The naive defense is "filter the input." That does not work for two reasons.

First, the attacker writes natural-language instructions that are indistinguishable from the language a real developer would use to ask the agent for help. There is no regex for "this paragraph is malicious." Any filter strict enough to catch a hidden instruction will also block legitimate bug reports that happen to phrase a request the same way.

Second, the input surface is not just the issue body. It is the title, the labels, the linked commits, the linked branches, the rendered Markdown, the embedded images (which can carry text in alt attributes or steganographic payloads), the issue author's display name, the assignee list, the comments thread, and every linked or transcluded reference. A defense that filters only the body leaves the rest of the surface unguarded.

The structural fix is "do not give the model tools while it is reading attacker-controlled text." That is incompatible with most agent designs, which are valuable precisely because they read issues and take action.

What actually works

The defenses that hold up in practice all reduce to one of three patterns.

Pattern 1: Phase the agent. Have one model read the issue and produce a structured summary. Have a second, non-tool-using model evaluate whether the summary contains an instruction. Only the third stage — the one with tools — runs, and it runs against the structured summary, not the original text. This is operationally expensive but actually reduces the attack surface.

Pattern 2: Strip and sandbox. Before passing issue text to a model with tools, normalize Unicode, strip invisible characters, render Markdown to plain text, and remove all links. The defender's checklist for this is long, and any miss reintroduces the bug. Most teams underestimate the amount of work in "strip the markup."

Pattern 3: Constrain the tools. The agent can read whatever it likes, but the tools it can call from inside an issue-reading context are restricted — read-only, scoped to a specific path, or routed through a confirmation step. This is the easiest defense to implement and the one most agent frameworks support poorly.

A robust posture combines all three. A posture that picks one and treats the problem as solved will be wrong the next time someone files a creatively-formatted issue.

The Rafter angle

Rafter (rafter.so) runs at PR-time. It does not stop a prompt-injection attack against an agent reading an issue — that bug is in the agent's runtime, not in your code. What Rafter does do is catch the precondition: a tool definition that exposes shell, file write, or network from inside a context that processes external text. rafter run flags overprivileged tool definitions in your agent code; --mode plus adds agentic deep-dives that trace data flow from external input to tool invocation. The bug class to scan for is "untrusted input reaches privileged tool" — the same class that has been on every taint-analysis tool's list for two decades, now applied to LLM tools instead of SQL queries.

The lesson from CamoLeak and the issue-class bugs that follow it is that AI security is still AppSec. The vectors are new. The bug classes are not.

The GitHub Issue Is Now an Attack Surface: Hidden-Prompt Bugs in AI-Assisted Repos

The shape of the bug

Why this is hard to fix

What actually works

The Rafter angle

Further reading

The GitHub Issue Is Now an Attack Surface: Hidden-Prompt Bugs in AI-Assisted Repos

The shape of the bug

Why this is hard to fix

What actually works

The Rafter angle

Further reading