TrapDoor: The Supply-Chain Attack That Poisons Your AI Assistant's Instructions

Written by the Rafter Team

A developer clones a repository, opens it in their editor, and asks their AI coding assistant for help. In the scenario the researchers warned about, the assistant reads the project, runs what looks like a routine security scan, and the developer's crypto wallets, SSH keys, and cloud credentials are on their way to an attacker. The developer did nothing wrong by their own lights — they trusted their tools, and their tools trusted the repo.
This is TrapDoor, a supply-chain campaign reported by Socket.dev. The malware is competent but ordinary. The delivery method is what makes it worth your attention: TrapDoor weaponizes the configuration files that tell AI coding assistants how to behave, in an attempt to turn the agent you invited into the project into the thing that robs you.
This post walks the attack, explains why the agent-instruction file became an attack surface, and lays out where the defense actually belongs.
TrapDoor planted 34+ malicious packages across 384+ versions on npm, PyPI, and Crates.io. Some npm payloads and pull requests also planted hidden instructions encoded in zero-width Unicode characters inside .cursorrules and CLAUDE.md. Those instructions appear designed to trick a compatible AI assistant into running a fake "security scan" that exfiltrates secrets. The file looks blank to a human.
The ordinary half: a multi-ecosystem package campaign
Strip away the novel part and TrapDoor is a textbook supply-chain operation, executed with care across three package ecosystems.
Socket.dev counted more than 34 malicious packages spanning over 384 versions, distributed across npm, PyPI, and Crates.io. One early PyPI upload was eth-security-auditor@0.1.0, published on May 22, 2026 at 20:20:18 UTC — a name engineered to look like legitimate Ethereum security tooling — though later IOC timelines place campaign activity as early as May 19.
Each ecosystem gets the execution primitive native to it. On npm, the payload fires from a postinstall hook — a script that runs automatically the moment the package is installed. On Rust, it rides in through build.rs, the build script that Cargo executes during compilation. On Python, it triggers at import time, running as soon as the package is loaded. Three ecosystems, three well-worn execution paths, one campaign.
The payload is a credential stealer. It steals crypto wallets, SSH keys, cloud credentials, browser profiles, environment variables, and AI-assistant configuration, and plants persistence through Git hooks and similar mechanisms. None of this is new. Wallet-stealer payloads that walk a hardcoded list of sensitive paths are a recurring pattern, the same one behind the Axios-class compromises and the broader family of postinstall hooks as a bug class.
If TrapDoor stopped here, it would be one more entry in a crowded year. It does not stop here.
The novel half: instructions your editor can't render
The move that sets TrapDoor apart targets the developer's AI coding assistant directly. The attackers planted hidden instructions inside .cursorrules and CLAUDE.md — the per-project configuration files that tools like Cursor and Claude Code read to learn how to behave in a given repository.
Those instructions are written in zero-width Unicode characters. These are real Unicode code points that carry no visible glyph — the zero-width space (U+200B), the zero-width joiner (U+200D), and related characters. A block of them renders as nothing on screen. The file looks empty, or looks like a short and unremarkable ruleset, while in fact carrying a full paragraph of attacker-authored text.
A human opening that file sees nothing suspicious. The AI assistant, which reads the raw bytes, can pull that hidden content into its parsed context as a paragraph of instructions.
What the agent is told to do
When the coding assistant loads the project, it ingests the configuration exactly as designed — that is the entire point of CLAUDE.md and .cursorrules. If the invisible text lands in its context, it can be treated as a legitimate directive from the repository owner.
The hidden directive is written to make the agent run a "security scan." If the agent complies, the "scan" walks the developer's machine for secrets and exfiltrates whatever it finds. The developer asked for help; the repository carried hidden instructions intended to make the agent run attacker-chosen commands, in characters the developer could not see. Socket cautions that this technique may not work consistently, so treat it as an attempted poisoning path rather than a guaranteed one.
This is the inversion that matters. A classic postinstall attack runs on your machine without involving your AI at all. TrapDoor is designed to recruit the AI as the executor. The agent's access, its willingness to run shell commands on your behalf, and its trust in the repo's own config are all meant to become the attack's tooling.
They tested it against real targets
The attackers also probed the propagation path directly. Beyond publishing packages, they opened malicious pull requests against real, high-profile repositories, including langchain-ai/langchain, langflow-ai/langflow, and browser-use/browser-use.
A merged pull request would be the ideal vector here. It would place both the poisoned dependency and the poisoned config into a project that many developers clone, and a developer who then pointed a compatible assistant at that repo could be exposed. Public reporting does not show that this vector succeeded — CSA reports the PRs were closed without being merged — so treat it as an attempted secondary distribution vector rather than a proven distribution mechanism.
Why this works: we taught agents to trust the repo
AI coding assistants read project configuration on purpose. A repo uses CLAUDE.md or .cursorrules to declare its conventions, its build commands, and its do's and don'ts, so the agent can adopt that context without the developer re-explaining it every session. The feature is that the agent trusts the file.
That trust relationship is the vulnerability. The model cannot distinguish instructions a maintainer wrote for legitimate reasons from instructions an attacker smuggled in through a malicious dependency or a poisoned pull request. Both are simply text in a file the agent is designed to obey.
Zero-width encoding removes the last line of defense. Even a careful developer reviewing the config sees a blank or innocuous file, because the malicious payload has no visible representation. The one check that might have caught this — a human reading the file — is defeated by construction.
This is the same root cause behind the config-as-execution incidents we documented in malicious repos exploiting AI coding tools: project files that used to be inert metadata quietly gained execution semantics. TrapDoor extends that pattern from "config that runs a shell command" to "config that runs your AI." It is also a cousin of MCP tool-description injection, where the text an agent reads to understand a tool becomes the channel for hijacking it. The trust boundary is the same one, moved up a layer.
What this changes for the industry
TrapDoor marks the point where the agent-instruction file becomes a first-class attack surface, and the implications run past this one campaign.
The first implication is that rendered human review is no longer enough on its own. Reviewing a repo's config used to mean a human could spot something wrong. Zero-width payloads weaken that assumption, because the bytes the agent reads and the glyphs the human sees can be two different documents. The answer is to scan AI config files for hidden Unicode, not to rely on a human glancing at the rendered file — though GitHub does flag some hidden or bidirectional Unicode, and scanners can detect zero-width characters.
The second is that AI assistants have inherited the full supply-chain threat model without inheriting its defenses. A package manager has npm audit and lockfiles and a decade of tooling. There is no equivalent maturity for "the instructions my coding agent absorbs from a cloned repo." An agent may follow a poisoned CLAUDE.md much as it would a legitimate one.
The third is that the attack composes. TrapDoor stacks a poisoned dependency, a poisoned config file, and a malicious pull request into a single chain, and any one of those is enough to start it. Defending the agent's reasoning is necessary but late; the chain has many earlier links where it can be cut.
The defense: catch the package before the agent reads it
The instinct is to harden the agent — strip zero-width characters, sanitize what it ingests, sandbox its tool calls. Those are worthwhile, and you should want your AI vendors building them. But they are a defense at the last possible moment, after the malicious package is already in your dependency tree and the poisoned file is already in the agent's context.
The cheaper and more reliable place to stop TrapDoor is earlier, in CI, before the agent ever opens the repo.
Catch the malicious package at dependency-scan time
TrapDoor's payloads ride in on packages with names like eth-security-auditor — typosquats and lookalikes of real security tooling. Software composition analysis backed by malicious-package intelligence or behavioral analysis can catch known TrapDoor IOCs before install. Note the limit: CVE-only SCA can miss malicious packages by design, since a freshly published credential stealer has no CVE to match. With up-to-date intelligence, though, if the package never installs, the poisoned CLAUDE.md that shipped alongside it never enters your working copy, and the agent never reads it.
Catch the theft at secrets-scan time
The endgame is exfiltrating SSH keys, cloud credentials, and environment variables. Secrets scanning reduces one class of exposure: it surfaces the credentials you didn't realize were sitting in your repo history, so you can rotate them before they leak. Be clear about what it does not do, though. It does not neutralize local credential-stealing malware, which goes after browser profiles, wallet keystores, SSH keys, env vars, and the contents of ~/.aws on the developer's own machine. An affected workstation still needs its credentials rotated. Treat any secret that has ever lived in a commit as compromised, and rotate it.
Run both on every pull request
TrapDoor's attempted PR vector is exactly the kind of merge-time entry point this control is built for. A scan that runs quarterly is checking for the threat in the wrong place at the wrong time. Dependency and secrets scanning gated on the merge — running automatically when a PR opens — puts the check where an attack like this would enter.
This is the work Rafter does: it runs in CI for secret scanning and remote SAST and SCA, and it installs as a one-click GitHub Marketplace Action. A CI scanner with up-to-date malicious-package intelligence could flag TrapDoor IOCs before merge, before any agent opens the repo. The goal isn't to make your assistant smarter about bad instructions. It's to make sure the bad instructions never arrive in the first place.
Takeaways
- The agent-instruction file can drive tool execution.
CLAUDE.mdand.cursorrulesare no longer inert metadata. Treat them as high-risk, untrusted input that your agent may obey. - Invisible does not mean absent. Zero-width Unicode lets an attacker hide a full instruction set in a file that renders as blank. Rendered human review is not a reliable check on its own; scan AI config files for hidden Unicode.
- The PR was an attempted vector. TrapDoor propagated through package registries and also opened pull requests against high-profile repos; public reporting does not show those PRs were merged. The merge gate is still where you want a scan, because it is where an attack like this would enter.
- Defend early, not late. Catch the poisoned package at dependency-scan time with malicious-package intelligence, and reduce repo-exposed secrets at secrets-scan time, both in CI, before the agent ever reads the repo — while remembering that affected workstations still need credential rotation.
Your AI assistant will believe whatever the repository tells it. The repository is the thing you have to check, and you have to check it before the agent does.