When LLMs Write Code: Trusting Untrusted Outputs

Imagine this: you ask an LLM to write a quick SQL query for a new feature. It generates something that looks right, you paste it into your backend, and ship.

Two days later, someone discovers they can run:


' OR '1'='1

…inside your login form and bypass authentication entirely. Classic SQL injection — but this time, it wasn't written by a careless junior dev. It was written by your AI assistant.

This is the reality of LLM code generation security: the code looks polished, but the vulnerabilities are real.

You wouldn't trust user input blindly — don't trust model output blindly either. LLM-generated code can be insecure, containing SQL injection, RCE, XSS, insecure deserialization, and hardcoded secrets.

Introduction

LLMs are writing more of our code than ever. GitHub Copilot, ChatGPT, Cursor, and countless in-editor assistants are generating millions of lines daily. This is a massive productivity unlock — but it comes with a hidden cost: LLM-generated code can be insecure.

SQL injection, RCE, XSS, insecure deserialization, hardcoded secrets — models generate these patterns all the time.

Developers often trust outputs implicitly, assuming "the AI knows best."

Vulnerabilities slip through because AI-generated code often bypasses normal review pipelines
Models don't understand security — they autocomplete patterns, good and bad
They've ingested public GitHub repos full of vulnerable code

Key idea: You wouldn't trust user input blindly — don't trust model output blindly either.

In this post, we'll look at how LLMs introduce vulnerabilities, why developers are prone to trust them too much, and how to build a secure pipeline that treats AI-generated code as untrusted input.

How LLM Code Generation Introduces Vulnerabilities

1. AI-Generated SQL Injection

One of the most common mistakes is SQL built through naïve string interpolation.

Example:


SELECT * FROM users WHERE email = '${userInput}'

This is exactly the kind of query an LLM might produce in response to "Write a SQL query to look up users by email." It looks plausible — until someone injects:


' OR '1'='1

…and gains access to every user.

NYU researchers found that about 40% of Copilot's SQL suggestions contained vulnerabilities, including SQLi. These issues aren't subtle — they're textbook.

2. Insecure API Endpoints

LLMs often generate "happy path" code — working, but insecure.

Example:


app.get('/files/:path', (req, res) => {
  const filePath = `/data/${req.params.path}`;
  res.sendFile(filePath);
});

Looks fine — until someone requests ../../etc/passwd.

This is directory traversal, and in some setups can lead to remote code execution or sensitive file disclosure.

The model doesn't know your security posture. It just autocompletes the pattern.

3. Command Injection / Shell Expansion

Another classic: shell commands built through unescaped string concatenation.

Example:


const { exec } = require('child_process');
exec(`convert ${inputFile} ${outputFile}`, (err) => {
  if (err) throw err;
});

If inputFile is attacker-controlled, a payload like:


image.png; rm -rf /

…can execute arbitrary commands.

Models often use exec or os.system because it's the fastest way to "get things done" — not the safest.

4. Insecure Defaults and Hardcoded Secrets

Sometimes models "fill in" examples with fake or placeholder credentials:


API_KEY = "sk-test-1234"

Developers copy/paste, forget to swap it out, or accidentally commit it to GitHub.

Other times, the model chooses insecure defaults — e.g., disabling SSL verification, allowing anonymous access, or setting weak passwords in generated config files.

Why Developers Trust LLM Outputs Too Much

LLMs sound authoritative. When they generate code, it's syntactically clean, commented, and looks like something a senior engineer might write. That creates false confidence.

Authority bias: The model "sounds right."
Speed pressure: Shipping fast often wins over reviewing carefully.
Coverage fallacy: Devs assume the model has been trained on best practices, so its code must be secure.

But models don't understand security. They autocomplete patterns — good and bad. They've ingested public GitHub repos full of vulnerable code. What you get is often a statistical average, not a vetted solution.

In one Veracode analysis, 45% of AI-generated code contained vulnerabilities, including high-severity ones.

The Security Mental Model: Treat Outputs as Untrusted Inputs

Here's the shift:

LLM outputs are untrusted inputs.

Just like you'd never run raw user input into a database or shell, you shouldn't deploy model-generated code without validation, scanning, and review.

This doesn't mean avoiding LLMs — it means treating them as part of your secure pipeline, not a shortcut around it.

Practical Defenses

1. Always Review AI-Generated Code

Treat AI output like a junior developer's PR: promising, but untrusted.

Check for obvious security flaws
Run through your normal code review checklist
Don't merge blindly just because it compiles

2. Run Static and Dynamic Scanners

Static analysis tools catch many of the issues LLMs introduce.

Static: ESLint, Bandit, Checkov, Semgrep
Dynamic: DAST tools, fuzzers, pen testing

Rafter integrates traditional static scanners with AI-aware security checks — flagging vulnerable patterns introduced by LLMs (e.g., SQLi, unsafe exec, insecure config).

3. Parameterize and Sanitize Inputs

The model might generate unsafe patterns, but you can fix them:

SQL: use prepared statements, not string interpolation
Shell: whitelist commands, use libraries that avoid raw shell expansion
Paths: normalize and sanitize input before using in filesystem operations

You can also prompt the model to use safer patterns ("use prepared statements"), but prompts aren't security controls — they're hints.

4. Integrate AI Code Review

Make AI code generation part of your secure pipeline, not a shortcut around it.

Recommended flow:

Generate code with LLM
Run it through static analysis (e.g., Rafter + linters)
Manual review
Deploy

This catches most vulnerabilities introduced by LLMs before they hit prod.

5. Monitor for Vulnerable Patterns Over Time

As teams rely more on LLMs, vulnerabilities may creep in gradually.

Regular scanning of your entire repo can catch:

Hardcoded secrets
Newly introduced SQL injection
Insecure library calls
Pattern regressions after refactors

Rafter continuously scans for these vulnerabilities, combining static analysis and AI-specific checks. Start by scanning your repo with Rafter to catch AI-generated vulnerabilities early.

Conclusion

LLMs are incredible accelerators — but they're not infallible. They write convincing code, but they don't understand security. If you trust their output blindly, you'll eventually ship a vulnerability.

Treat LLM output like untrusted input.

Review it like a PR
Scan it like user input
Validate it before it reaches production

By integrating scanning and review into your workflow, you can move fast without leaving SQL injection, RCE, or hardcoded secrets lurking in your codebase.

Start by scanning your repo with Rafter to catch AI-generated vulnerabilities early.
Treat model outputs as inputs, not gospel.

Imagine this: you ask an LLM to write a quick SQL query for a new feature. It generates something that looks right, you paste it into your backend, and ship.

Two days later, someone discovers they can run:


' OR '1'='1

…inside your login form and bypass authentication entirely. Classic SQL injection — but this time, it wasn't written by a careless junior dev. It was written by your AI assistant.

This is the reality of LLM code generation security: the code looks polished, but the vulnerabilities are real.

Introduction

SQL injection, RCE, XSS, insecure deserialization, hardcoded secrets — models generate these patterns all the time.

Developers often trust outputs implicitly, assuming "the AI knows best."

Vulnerabilities slip through because AI-generated code often bypasses normal review pipelines
Models don't understand security — they autocomplete patterns, good and bad
They've ingested public GitHub repos full of vulnerable code

Key idea: You wouldn't trust user input blindly — don't trust model output blindly either.

In this post, we'll look at how LLMs introduce vulnerabilities, why developers are prone to trust them too much, and how to build a secure pipeline that treats AI-generated code as untrusted input.

How LLM Code Generation Introduces Vulnerabilities

1. AI-Generated SQL Injection

One of the most common mistakes is SQL built through naïve string interpolation.

Example:


SELECT * FROM users WHERE email = '${userInput}'

This is exactly the kind of query an LLM might produce in response to "Write a SQL query to look up users by email." It looks plausible — until someone injects:


' OR '1'='1

…and gains access to every user.

NYU researchers found that about 40% of Copilot's SQL suggestions contained vulnerabilities, including SQLi. These issues aren't subtle — they're textbook.

2. Insecure API Endpoints

LLMs often generate "happy path" code — working, but insecure.

Example:


app.get('/files/:path', (req, res) => {
  const filePath = `/data/${req.params.path}`;
  res.sendFile(filePath);
});

Looks fine — until someone requests ../../etc/passwd.

This is directory traversal, and in some setups can lead to remote code execution or sensitive file disclosure.

The model doesn't know your security posture. It just autocompletes the pattern.

3. Command Injection / Shell Expansion

Another classic: shell commands built through unescaped string concatenation.

Example:


const { exec } = require('child_process');
exec(`convert ${inputFile} ${outputFile}`, (err) => {
  if (err) throw err;
});

If inputFile is attacker-controlled, a payload like:


image.png; rm -rf /

…can execute arbitrary commands.

Models often use exec or os.system because it's the fastest way to "get things done" — not the safest.

4. Insecure Defaults and Hardcoded Secrets

Sometimes models "fill in" examples with fake or placeholder credentials:


API_KEY = "sk-test-1234"

Developers copy/paste, forget to swap it out, or accidentally commit it to GitHub.

Other times, the model chooses insecure defaults — e.g., disabling SSL verification, allowing anonymous access, or setting weak passwords in generated config files.

Why Developers Trust LLM Outputs Too Much

LLMs sound authoritative. When they generate code, it's syntactically clean, commented, and looks like something a senior engineer might write. That creates false confidence.

Authority bias: The model "sounds right."
Speed pressure: Shipping fast often wins over reviewing carefully.
Coverage fallacy: Devs assume the model has been trained on best practices, so its code must be secure.

In one Veracode analysis, 45% of AI-generated code contained vulnerabilities, including high-severity ones.

The Security Mental Model: Treat Outputs as Untrusted Inputs

Here's the shift:

LLM outputs are untrusted inputs.

Just like you'd never run raw user input into a database or shell, you shouldn't deploy model-generated code without validation, scanning, and review.

This doesn't mean avoiding LLMs — it means treating them as part of your secure pipeline, not a shortcut around it.

Practical Defenses

1. Always Review AI-Generated Code

Treat AI output like a junior developer's PR: promising, but untrusted.

Check for obvious security flaws
Run through your normal code review checklist
Don't merge blindly just because it compiles

2. Run Static and Dynamic Scanners

Static analysis tools catch many of the issues LLMs introduce.

Static: ESLint, Bandit, Checkov, Semgrep
Dynamic: DAST tools, fuzzers, pen testing

Rafter integrates traditional static scanners with AI-aware security checks — flagging vulnerable patterns introduced by LLMs (e.g., SQLi, unsafe exec, insecure config).

3. Parameterize and Sanitize Inputs

The model might generate unsafe patterns, but you can fix them:

SQL: use prepared statements, not string interpolation
Shell: whitelist commands, use libraries that avoid raw shell expansion
Paths: normalize and sanitize input before using in filesystem operations

You can also prompt the model to use safer patterns ("use prepared statements"), but prompts aren't security controls — they're hints.

4. Integrate AI Code Review

Make AI code generation part of your secure pipeline, not a shortcut around it.

Recommended flow:

Generate code with LLM
Run it through static analysis (e.g., Rafter + linters)
Manual review
Deploy

This catches most vulnerabilities introduced by LLMs before they hit prod.

5. Monitor for Vulnerable Patterns Over Time

As teams rely more on LLMs, vulnerabilities may creep in gradually.

Regular scanning of your entire repo can catch:

Hardcoded secrets
Newly introduced SQL injection
Insecure library calls
Pattern regressions after refactors

Rafter continuously scans for these vulnerabilities, combining static analysis and AI-specific checks. Start by scanning your repo with Rafter to catch AI-generated vulnerabilities early.

Conclusion

Treat LLM output like untrusted input.

Review it like a PR
Scan it like user input
Validate it before it reaches production

By integrating scanning and review into your workflow, you can move fast without leaving SQL injection, RCE, or hardcoded secrets lurking in your codebase.

Start by scanning your repo with Rafter to catch AI-generated vulnerabilities early.
Treat model outputs as inputs, not gospel.