How Rafter Scans AI-Generated Code for Security Vulnerabilities: Under the Hood

Rafter scans AI-generated code differently than traditional SAST tools. Traditional scanners were built for human-written code—consistent style, deliberate architecture, gradual evolution over months or years. AI-generated code breaks those assumptions. It arrives in bulk, mixes frameworks freely, skips error handling inconsistently, and introduces vulnerability patterns that human developers would catch mid-keystroke but that LLMs produce without hesitation. Rafter's scanning pipeline addresses these differences by combining battle-tested open-source static analyzers with a proprietary AI review layer, then consolidating findings into deduplicated, severity-ranked results with copy-paste fix prompts you can drop directly into your AI coding tool.

This post explains exactly how that pipeline works. We built Rafter to scan the code that AI tools generate, so we're going to be transparent about what we catch, what we miss, and how we're improving.

Why AI-Generated Code Needs Different Scanning

Human developers build mental models as they write. They remember that the auth middleware on line 12 protects the route on line 87. They notice when error handling in one file contradicts the pattern in another. They hesitate before hardcoding a default password, even in a prototype.

AI coding assistants don't do any of that. They generate code token by token, optimizing for local coherence without maintaining a global security model. The result is a distinct set of vulnerability patterns that traditional scanners weren't designed to catch at volume.

Pattern	Human-Written Code	AI-Generated Code
Error handling	Consistent within a project—teams adopt conventions	Inconsistent across files—one route validates input, the next doesn't
Authentication	Centralized middleware or decorator pattern	Mixed approaches—JWT in one endpoint, session cookies in another, nothing in a third
Hardcoded values	Developers know to use env vars (usually)	Default credentials, API keys, and connection strings appear in generated code frequently
Framework usage	Follows framework conventions and security defaults	Mixes framework patterns—React with raw DOM manipulation, Express with manual header setting
Dependency choices	Teams vet and standardize dependencies	Suggests whatever was common in training data, including deprecated or vulnerable packages
Permission defaults	Teams review IAM, CORS, and access policies	Over-permissive by default—`Access-Control-Allow-Origin: *`, public S3 buckets, disabled RLS

The Veracode State of AI-Generated Code Security report (July 2025) found that 45% of AI-generated code contains at least one security vulnerability. Pearce et al. (2021) showed that 40% of Copilot suggestions in security-sensitive contexts were insecure, and BaxBench (February 2025) found 49% of AI-generated outputs were vulnerable or incorrect. The problem isn't that AI writes worse code than humans—it's that AI writes code faster than humans can review it, and the vulnerability patterns are different enough that scanners tuned for human-written code miss them.

AI-generated code isn't inherently less secure than human-written code. But it's produced at a volume and speed that makes manual review impossible. Automated scanning tuned for AI output patterns is the only way to maintain security velocity.

Rafter's Scanning Pipeline

When you connect a repository to Rafter—through the GitHub app, CLI, or API—the scanning pipeline runs through five stages:

Choosing Between Shallow and Deep Analysis

Rafter offers two scan modes that trade speed for coverage:

Mode	Scan Time	What Runs	Best For
Shallow	~10–30 seconds	Secrets detection (Betterleaks) + dependency audit (Trivy) + critical SAST rules	Every commit in CI—fast feedback on the highest-severity issues
Deep	~1–3 minutes	Full pipeline: all static analyzers + AI contextual review + IaC scanning	Pre-merge, pre-deploy, or when onboarding a new AI-generated codebase

For most teams building with AI coding tools, the recommended default is: shallow scan on every push, deep scan on every pull request. This catches leaked secrets and critical CVEs immediately while reserving the AI-powered contextual review—which takes longer—for code that's actually heading toward production.

You can configure scan mode via the dashboard, CLI flag (rafter scan --mode deep), or the API ("mode": "deep" in the request body).

1. Repository ingestion. Rafter clones the repo at the specified branch or commit, identifies the languages and frameworks in use, and routes the codebase to the appropriate scanner configurations.

2. Static analysis layer. Multiple open-source and proprietary analyzers run in parallel against the codebase. Each scanner targets different vulnerability classes—secrets detection, dependency vulnerabilities, SAST pattern matching, infrastructure misconfigurations, and code quality issues.

3. AI-powered contextual review. Rafter's proprietary AI scanner (rf) analyzes the code with context that static rules can't capture—cross-file data flow, intent inference, and AI-specific vulnerability patterns like inconsistent auth approaches or missing input validation in generated routes.

4. Finding consolidation. Results from all scanners are merged, deduplicated by fingerprint, classified against the OWASP Top 10:2025, and ranked by severity.

5. Fix generation. Every finding gets a plain-English explanation and a structured fix prompt designed to paste directly into ChatGPT, Claude, Cursor, or any AI coding assistant.

A typical scan completes in 30 seconds to 2 minutes. Results appear in your dashboard, CLI output, or API response in SARIF-compatible format for integration with existing toolchains.

Static Analysis Layer

Rafter doesn't rely on a single scanner. The static analysis layer runs multiple specialized tools in parallel, each targeting vulnerability classes the others miss:

Scanner	Category	What It Finds
Betterleaks	Secret Detection	Hardcoded API keys, tokens, passwords, and credentials in code and git history
Trivy + Bandit + OpenGrep	SAST + SCA	Known CVEs in dependencies, SQL injection, XSS, command injection, insecure crypto, deserialization flaws
Checkov	Infrastructure as Code	Terraform, CloudFormation, and Kubernetes misconfigurations—public buckets, open security groups, missing encryption
Rafter AI (`rf`)	AI-Specific	Proprietary rules tuned for AI-generated code patterns—inconsistent auth, over-permissive defaults, framework misuse
ESLint Security Rules	Code Quality	JavaScript/TypeScript anti-patterns, prototype pollution vectors, unsafe regex

These aren't run sequentially—they execute in parallel on GCP Cloud Run workers, so adding scanners doesn't linearly increase scan time. Each scanner produces findings in a normalized format that feeds into the consolidation stage.

Why Multiple Scanners Matter

No single scanner catches everything. Betterleaks finds the AWS key hardcoded in a config file. Trivy finds the critical CVE in an outdated dependency. Bandit catches the eval() call with user input. Checkov catches the Terraform resource with a public IP and no security group. The Rafter AI scanner catches the endpoint that has authentication in development but not in production because the AI tool generated the route handler without the auth middleware that exists on every other route.

Running all of these together, then deduplicating the overlapping results, gives you broader coverage than any single tool while keeping noise manageable.

AI-Powered Contextual Review

The Rafter AI scanner (rf) is where things get interesting—and where we should be honest about both strengths and limitations.

Traditional static analysis works by matching patterns: "if a user-controlled value reaches an SQL query without parameterization, flag it." This works well for known vulnerability patterns but fails when the vulnerability is contextual—when the issue isn't in any single line but in how multiple files interact.

Rafter's AI layer adds three capabilities that static rules can't provide:

Cross-File Context

AI-generated codebases frequently have inconsistent security patterns across files. One API route validates input, sanitizes output, and checks authentication. The next route—generated in a different session or by a different prompt—does none of that. Static analysis checks each file independently. The AI layer reviews routes in context, flagging when a route deviates from the security patterns established elsewhere in the project.

Intent Inference

When an AI tool generates a file upload handler without file type validation, a static analyzer sees valid code. The AI layer recognizes the intent—"this is a file upload endpoint"—and checks whether the implementation matches security expectations for that intent: file type validation, size limits, storage path sanitization, and malware scanning hooks.

AI-Specific Pattern Detection

Some vulnerability patterns are almost exclusive to AI-generated code:


// ✗ Vulnerable: AI-generated Supabase client with RLS bypassed

const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL,
  process.env.SUPABASE_SERVICE_ROLE_KEY  // Service role key bypasses RLS
)

// This endpoint lets any authenticated user read ANY user's data
export async function GET(req) {
  const { data } = await supabase
    .from('profiles')
    .select('*')
    .eq('id', req.params.userId)
  return Response.json(data)
}


// ✓ Secure: Using anon key with RLS enforced

const supabase = createClient(
  process.env.NEXT_PUBLIC_SUPABASE_URL,
  process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY  // Anon key respects RLS
)

export async function GET(req) {
  // RLS policy ensures users can only read their own profile
  const { data } = await supabase
    .from('profiles')
    .select('*')
    .eq('id', req.params.userId)
  return Response.json(data)
}

This pattern—using the Supabase service role key in a client-facing endpoint—is rare in human-written code because developers learn about RLS before building with Supabase. AI coding assistants generate it constantly because the service role key "works" and produces fewer errors during generation. Rafter's AI scanner flags this pattern specifically, along with similar issues in Firebase (missing security rules), Appwrite (exposed endpoints), and other backend-as-a-service platforms that AI tools frequently suggest.

Finding Consolidation and Deduplication

Five scanners running in parallel produce overlapping findings. The same hardcoded secret might be flagged by Betterleaks (as a credential leak), by the SAST scanner (as a hardcoded string in a sensitive context), and by the AI scanner (as a service key used in a client-facing endpoint). Showing three findings for one issue creates noise that makes developers ignore results.

Rafter's consolidation pipeline handles this in three steps:

1. Fingerprint generation. Each finding gets a SHA-256 fingerprint derived from the finding type and location. The same vulnerability in the same file produces the same fingerprint across runs, which enables tracking over time and prevents duplicate alerts on unchanged code.

2. Deduplication. Findings with matching fingerprints are merged. The first occurrence wins, but metadata from overlapping findings enriches the primary finding—if Betterleaks identifies the secret type and the AI scanner identifies the security impact of where it's used, both contribute to the final finding.

3. OWASP classification and severity ranking. Every deduplicated finding is mapped to the OWASP Top 10:2025 categories using a three-tier classification system:

Priority 1: CWE-to-OWASP mapping (most precise—if the finding has a CWE identifier, it maps directly)
Priority 2: Keyword-based classification (regex matching on finding descriptions)
Priority 3: Tool-based heuristic (Betterleaks findings default to Cryptographic Failures, Checkov findings default to Security Misconfiguration)

Severity levels are normalized across all scanners: critical, high, severe, and blocker all map to error. warning and medium map to warning. note, info, and low map to note. This gives you a single, consistent severity scale regardless of which scanner produced the finding.

The result is a clean, ranked list of findings—not a noisy dump of raw scanner output. Each finding appears once, with a clear severity level, OWASP category, and file location.

Fix Generation: From Finding to Prompt

Finding vulnerabilities is the easy part. The hard part is fixing them—especially when the developer who needs to fix the code is using an AI tool to write it. Rafter closes this loop by generating structured fix prompts for every finding.

Here's how it works:

1. Finding context assembly. For each vulnerability, Rafter assembles the rule ID, severity level, file path, line number, and a plain-English description of what's wrong and why it matters.

2. Single-vulnerability prompts. Each finding produces a prompt structured for AI coding assistants:


You are a senior application-security engineer. Never mock data,
suppress linter security rules, or shortcut the fix. Think step-by-step.

VULNERABILITY: hardcoded-credentials
SEVERITY: error
FILE: src/lib/db.ts:14
DESCRIPTION: Database connection string with credentials hardcoded in source code.

Provide:
1. Explanation of the vulnerability and its impact
2. Step-by-step remediation with code examples
3. Prevention strategies to avoid reintroduction

3. Bulk remediation prompts. When a scan produces multiple findings, Rafter groups them by rule ID and generates a consolidated prompt that asks for prioritized remediation across all issues—so you can paste one prompt into Claude or ChatGPT and get a comprehensive fix plan.

The fix prompt workflow looks like this in practice:

Rafter scans your repo and finds 8 vulnerabilities
You open the findings in the dashboard or CLI
Each finding has a "Copy Fix Prompt" button
You paste the prompt into Cursor, ChatGPT, Claude, or Lovable
The AI tool generates the fix using the structured context
You rescan to verify the fix resolved the issue

This scan-find-fix-rescan loop is deliberate. AI tools generate vulnerable code; Rafter finds the vulnerabilities; AI tools fix the vulnerabilities using Rafter's structured prompts; Rafter verifies the fixes. The AI is both the source of the problem and—with proper guidance—the solution.

Applying Fix Prompts and Verifying the Fix

Rafter generates the prompt. You apply it in your AI coding tool of choice, then rescan to confirm the vulnerability is gone. Here's the workflow for the most common environments:

Cursor

Copy the fix prompt from the Rafter dashboard or CLI output (rafter findings --copy <finding-id>).
Open the Cursor composer (Cmd+Shift+I) and paste the prompt.
Review the diff Cursor proposes—Rafter's prompts include file path and line number, so Cursor targets the right location.
Accept the change, then re-run the Rafter scan: in the Rafter dashboard, click Rescan, or run rafter scan in your terminal.
Confirm the finding no longer appears in the results before merging.

ChatGPT / Claude (chat interface)

Copy the fix prompt from the dashboard.
Paste it into a new chat session. The prompt includes full context (severity, file, line, description), so no additional setup is needed.
Apply the suggested code change in your editor.
Rescan with Rafter to verify.

GitHub Actions (automated rescan) If you have the Rafter GitHub App installed, every push triggers a scan automatically. Fix the vulnerability in a branch, push, and the scan result updates in the PR check.

The rescan step is not optional. AI-generated fixes can introduce new issues—the fix prompt for a hardcoded credential sometimes produces code that reads from environment variables correctly but misses a second occurrence in a test file. A rescan catches regressions before they reach production.

What Rafter Doesn't Catch (Honest Assessment)

No scanner catches everything, and we'd rather be transparent about our limits than have you discover them in production.

Business logic vulnerabilities. Rafter can't determine that your e-commerce checkout allows negative quantities, that your access control lets users escalate their own permissions through a multi-step workflow, or that your rate limiter has a race condition in the token bucket implementation. Business logic requires understanding what the application should do, which is beyond any automated scanner.

Runtime-only vulnerabilities. Issues that only manifest at runtime—SSRF through DNS rebinding, timing-based side channels, race conditions in concurrent request handling—require dynamic testing (DAST) or targeted penetration testing. Rafter is a static analysis platform.

Novel vulnerability classes. Rafter's static rules and AI patterns detect known vulnerability classes and their variations. A genuinely novel attack technique—something nobody has seen before—won't match existing patterns until we update the scanner. We release rule updates continuously, but there's always a lag between discovery and detection.

Obfuscated or intentionally deceptive code. If malicious code is deliberately obfuscated to evade scanning—encoded payloads, indirect evaluation through computed property access, multi-stage deobfuscation—static analysis has fundamental limits. This is a constraint shared by all SAST tools.

Deep dependency chain analysis. Rafter scans your direct and transitive dependencies for known CVEs via Trivy, but it doesn't trace data flow through third-party library code. If a vulnerability exists in how your code interacts with a library's internal behavior, that requires deeper program analysis than our current pipeline provides.

We're actively working on expanding coverage in these areas—particularly runtime-aware analysis and deeper cross-file data flow tracking. But today, these are real gaps, and combining Rafter with DAST tools and manual penetration testing is the right approach for comprehensive coverage.

Integration Points

Rafter is designed to fit into the workflow you already have, not replace it.

GitHub App

Connect your GitHub account, select repositories, and Rafter scans automatically on push or pull request. The GitHub app handles authentication, branch selection, and org-level configuration—including per-repo scan triggers, branch filters, scan modes (fast or plus), cooldown periods, and monthly scan caps for cost control.

CLI

For local development and CI/CD integration:


# Install
npx @rafter-security/cli scan  # Node projects
pip install rafter-cli && rafter scan  # Python projects

# Scan with options
rafter scan --mode plus --branch main

REST API

For programmatic access and custom integrations:


# Trigger a scan
curl -X POST https://api.rafter.so/api/static/scan \
  -H "X-API-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"repository_name": "owner/repo", "scan_mode": "fast"}'

# Check results
curl https://api.rafter.so/api/static/scan?scan_id=xxx&format=json \
  -H "X-API-Key: your-api-key"

Results come back in SARIF-compatible JSON or formatted Markdown, so you can feed them into existing security dashboards, Slack alerts, or issue trackers.

Setting Up Rafter on Your AI-Built Project

Whether you're building with Cursor, Claude, Lovable, Replit, or any other AI coding tool, here's how to get scanning running in five minutes:

Sign in at rafter.so/dashboard with your GitHub account
Connect your repo—select the repository and branch you want to scan
Run your first scan—choose fast for quick results or plus for deep analysis
Review findings—each vulnerability includes severity, OWASP category, file location, and a plain-English explanation
Fix with AI—copy the fix prompt, paste it into your AI coding tool, apply the fix, and rescan to verify

For CI/CD integration, add the Rafter CLI to your GitHub Actions workflow to scan every pull request automatically.

First scans are free. No credit card required. Most AI-built projects complete scanning in 30 seconds to 2 minutes.

Conclusion

Rafter's scanning pipeline is built for the specific reality of AI-generated code: high volume, inconsistent patterns, and vulnerability classes that traditional scanners weren't designed to catch. The combination of open-source static analyzers, a proprietary AI review layer, and structured fix prompts creates a scan-find-fix loop that works with AI tools rather than against them.

We're transparent about what we catch and what we miss because trust matters more than marketing. No scanner catches everything. Rafter catches a meaningful set of vulnerabilities that would otherwise ship to production in AI-generated code—and it does it in 30 seconds to 2 minutes.

Next steps:

Run your first scan—sign in with GitHub, select a repo, get results in 30 seconds
Review the security tool comparison guide to understand where Rafter fits in a comprehensive security strategy
Set up automated scanning in your CI/CD pipeline for continuous protection
Read about vibe coding security to understand the broader security landscape for AI-generated applications