When LLMs Write Code: Trusting Untrusted Outputs

Written by Rafter Team
January 30, 2026

Imagine this: you ask an LLM to write a quick SQL query for a new feature. It generates something that looks right, you paste it into your backend, and ship.
Two days later, someone discovers they can run:
' OR '1'='1
…inside your login form and bypass authentication entirely. Classic SQL injection — but this time, it wasn't written by a careless junior dev. It was written by your AI assistant.
This is the reality of LLM code generation security: the code looks polished, but the vulnerabilities are real.
You wouldn't trust user input blindly — don't trust model output blindly either. LLM-generated code can be insecure, containing SQL injection, RCE, XSS, insecure deserialization, and hardcoded secrets.
Introduction
LLMs are writing more of our code than ever. GitHub Copilot, ChatGPT, Cursor, and countless in-editor assistants are generating millions of lines daily. This is a massive productivity unlock — but it comes with a hidden cost: LLM-generated code can be insecure.
SQL injection, RCE, XSS, insecure deserialization, hardcoded secrets — models generate these patterns all the time.
Developers often trust outputs implicitly, assuming "the AI knows best."
- Vulnerabilities slip through because AI-generated code often bypasses normal review pipelines
- Models don't understand security — they autocomplete patterns, good and bad
- They've ingested public GitHub repos full of vulnerable code
Key idea: You wouldn't trust user input blindly — don't trust model output blindly either.
In this post, we'll look at how LLMs introduce vulnerabilities, why developers are prone to trust them too much, and how to build a secure pipeline that treats AI-generated code as untrusted input.
How LLM Code Generation Introduces Vulnerabilities
1. AI-Generated SQL Injection
One of the most common mistakes is SQL built through naïve string interpolation.
Example:
SELECT * FROM users WHERE email = '${userInput}'
This is exactly the kind of query an LLM might produce in response to "Write a SQL query to look up users by email." It looks plausible — until someone injects:
' OR '1'='1
…and gains access to every user.
NYU researchers found that about 40% of Copilot's SQL suggestions contained vulnerabilities, including SQLi. These issues aren't subtle — they're textbook.
2. Insecure API Endpoints
LLMs often generate "happy path" code — working, but insecure.
Example:
app.get('/files/:path', (req, res) => {
const filePath = `/data/${req.params.path}`;
res.sendFile(filePath);
});
Looks fine — until someone requests ../../etc/passwd.
This is directory traversal, and in some setups can lead to remote code execution or sensitive file disclosure.
The model doesn't know your security posture. It just autocompletes the pattern.
3. Command Injection / Shell Expansion
Another classic: shell commands built through unescaped string concatenation.
Example:
const { exec } = require('child_process');
exec(`convert ${inputFile} ${outputFile}`, (err) => {
if (err) throw err;
});
If inputFile is attacker-controlled, a payload like:
image.png; rm -rf /
…can execute arbitrary commands.
Models often use exec or os.system because it's the fastest way to "get things done" — not the safest.
4. Insecure Defaults and Hardcoded Secrets
Sometimes models "fill in" examples with fake or placeholder credentials:
API_KEY = "sk-test-1234"
Developers copy/paste, forget to swap it out, or accidentally commit it to GitHub.
Other times, the model chooses insecure defaults — e.g., disabling SSL verification, allowing anonymous access, or setting weak passwords in generated config files.
Why Developers Trust LLM Outputs Too Much
LLMs sound authoritative. When they generate code, it's syntactically clean, commented, and looks like something a senior engineer might write. That creates false confidence.
- Authority bias: The model "sounds right."
- Speed pressure: Shipping fast often wins over reviewing carefully.
- Coverage fallacy: Devs assume the model has been trained on best practices, so its code must be secure.
But models don't understand security. They autocomplete patterns — good and bad. They've ingested public GitHub repos full of vulnerable code. What you get is often a statistical average, not a vetted solution.
In one Veracode analysis, 45% of AI-generated code contained vulnerabilities, including high-severity ones.
The Security Mental Model: Treat Outputs as Untrusted Inputs
Here's the shift:
LLM outputs are untrusted inputs.
Just like you'd never run raw user input into a database or shell, you shouldn't deploy model-generated code without validation, scanning, and review.
This doesn't mean avoiding LLMs — it means treating them as part of your secure pipeline, not a shortcut around it.
Practical Defenses
1. Always Review AI-Generated Code
Treat AI output like a junior developer's PR: promising, but untrusted.
- Check for obvious security flaws
- Run through your normal code review checklist
- Don't merge blindly just because it compiles
2. Run Static and Dynamic Scanners
Static analysis tools catch many of the issues LLMs introduce.
Static: ESLint, Bandit, Checkov, Semgrep
Dynamic: DAST tools, fuzzers, pen testing
Rafter integrates traditional static scanners with AI-aware security checks — flagging vulnerable patterns introduced by LLMs (e.g., SQLi, unsafe exec, insecure config).
3. Parameterize and Sanitize Inputs
The model might generate unsafe patterns, but you can fix them:
- SQL: use prepared statements, not string interpolation
- Shell: whitelist commands, use libraries that avoid raw shell expansion
- Paths: normalize and sanitize input before using in filesystem operations
You can also prompt the model to use safer patterns ("use prepared statements"), but prompts aren't security controls — they're hints.
4. Integrate AI Code Review
Make AI code generation part of your secure pipeline, not a shortcut around it.
Recommended flow:
- Generate code with LLM
- Run it through static analysis (e.g., Rafter + linters)
- Manual review
- Deploy
This catches most vulnerabilities introduced by LLMs before they hit prod.
5. Monitor for Vulnerable Patterns Over Time
As teams rely more on LLMs, vulnerabilities may creep in gradually.
Regular scanning of your entire repo can catch:
- Hardcoded secrets
- Newly introduced SQL injection
- Insecure library calls
- Pattern regressions after refactors
Rafter continuously scans for these vulnerabilities, combining static analysis and AI-specific checks. Start by scanning your repo with Rafter to catch AI-generated vulnerabilities early.
Conclusion
LLMs are incredible accelerators — but they're not infallible. They write convincing code, but they don't understand security. If you trust their output blindly, you'll eventually ship a vulnerability.
Treat LLM output like untrusted input.
- Review it like a PR
- Scan it like user input
- Validate it before it reaches production
By integrating scanning and review into your workflow, you can move fast without leaving SQL injection, RCE, or hardcoded secrets lurking in your codebase.
Start by scanning your repo with Rafter to catch AI-generated vulnerabilities early.
Treat model outputs as inputs, not gospel.
Related Resources
- AI Agent Supply Chain Security
- Prompt Injection 101: How Attackers Hijack Your LLM
- Silent Exfiltration: How Secrets Leak Through Model Output
- Real-World AI Jailbreaks: How Innocent Prompts Become Exploits
- AI Builder Security: 7 New Attack Surfaces You Need to Know
- Securing AI-Generated Code: Best Practices
- Injection Attacks: OWASP Top 10 Explained
- API Keys Explained: Secure Usage for Developers
- Security Tool Comparisons: Choosing the Right Scanner