False Positives in Security Scanning: A Triage Guide for Developers

The average SAST tool produces between 30% and 70% false positives depending on the codebase and configuration. A 2025 study by Ghost Security scanning public GitHub repositories found that over 91% of flagged vulnerabilities across Go, Python, and PHP projects were false positives. That means for every 10 findings your scanner reports, anywhere from 3 to 9 of them are wrong. This isn't a bug in your tooling. It's a fundamental tradeoff baked into how static analysis works. Scanners that reduce false positives also reduce true positives. The real skill isn't finding a scanner with zero false positives---it doesn't exist. It's building a triage workflow that separates signal from noise without burning out your team.

Alert fatigue is a security risk. Research shows that 70% of security team time is spent investigating false positives, and 33% of organizations have been late responding to real attacks because teams were occupied with phantom threats. A triage workflow isn't optional---it's a security control.

Why False Positives Happen

False positives aren't scanner failures. They're the predictable result of tools that must make decisions without complete information. Understanding the mechanics helps you predict which findings to trust and which to scrutinize.

Over-Approximation in Static Analysis

Static analysis tools don't run your code. They reason about every possible execution path, including paths that can never actually execute. When a SAST tool encounters a branch like if (userIsAdmin && debugMode), it analyzes both sides---even if debugMode is always false in production. This is called over-approximation: the tool assumes more paths are reachable than actually are, which inflates the finding count.

The alternative---under-approximation---would miss real bugs. Scanner designers choose to over-report rather than under-report because a missed vulnerability (false negative) can lead to a breach, while a false positive only wastes investigation time. It's a deliberate engineering decision.

Missing Runtime Context

Static analysis sees source code, not running applications. It can't evaluate environment variables, database state, feature flags, or request context. Consider this Express.js route:


// ✗ SAST flags this as SQL injection
app.get('/users', async (req, res) => {
  const role = req.query.role;
  const users = await db.query(`SELECT * FROM users WHERE role = '${role}'`);
  res.json(users);
});

A SAST tool correctly flags this as SQL injection---user input flows directly into a query string. But now consider this version:


// ✓ SAST may still flag this (false positive)
app.get('/users', async (req, res) => {
  const role = req.query.role;
  if (!['admin', 'user', 'viewer'].includes(role)) {
    return res.status(400).json({ error: 'Invalid role' });
  }
  const users = await db.query(`SELECT * FROM users WHERE role = '${role}'`);
  res.json(users);
});

The allowlist validation makes injection impossible. But many SAST tools still flag it because they don't track the constraint through the conditional logic. The taint source (req.query.role) still reaches the sink (db.query), and the tool can't prove the allowlist is sufficient.

Generic Rules Applied to Specific Frameworks

Scanner rule libraries are designed to work across many codebases. A rule that flags eval() in JavaScript is correct in general---but inside a build tool like Webpack or a template engine's internals, eval usage is intentional and sandboxed. Framework-specific context that would immediately tell a human "this is fine" is invisible to a generic rule.

AI Scanner Hallucinations

AI-powered scanners introduce a new class of false positives: hallucinated vulnerabilities. An LLM reviewing code might "see" a vulnerability pattern that doesn't actually exist, confidently explaining a SQL injection in code that uses parameterized queries throughout. Unlike traditional SAST false positives (which follow deterministic rules), AI false positives are non-deterministic---the same code might produce different findings on repeated scans.

The Cost of False Positives

False positives aren't free. They extract a measurable cost from your engineering organization.

Developer Trust Erosion

When developers encounter five false positives in a row, they stop trusting the scanner. The sixth finding---which might be a real SQL injection---gets the same dismissive treatment. This is the core danger: false positives don't just waste time, they actively undermine the scanner's ability to catch real vulnerabilities.

Alert Fatigue by the Numbers

The numbers paint a stark picture:

Metric	Value	Source
Security team time on false positives	70%	Contrast Security, 2024
Organizations late to real attacks due to false positive overload	33%	Contrast Security, 2024
SAST false positive rate (public repos, multi-language)	91%+	Ghost Security, 2025
NIST finding: Java SAST false positive rate	Up to 78%	NIST SATE Reports

Slowed CI/CD Pipelines

Teams that gate deployments on zero scanner findings (without triage) either slow their release cycle to a crawl or eventually disable the gate entirely. Both outcomes are worse than having a triage process that acknowledges false positives exist and handles them systematically.

False Positives vs False Negatives: The Tradeoff

Every scanner makes a tradeoff between precision (what percentage of reported findings are real) and recall (what percentage of real vulnerabilities are found). You can't maximize both.

Scanner Tuning	Precision	Recall	Risk
High sensitivity (aggressive)	Low---many false positives	High---catches most real bugs	Alert fatigue, wasted developer time
Low sensitivity (conservative)	High---most findings are real	Low---misses real vulnerabilities	False sense of security
Balanced (default)	Medium	Medium	Best starting point for most teams

What's worse---a false positive or a false negative? It depends on context. For a banking application handling wire transfers, a false negative (missed vulnerability) could mean millions in losses. A few extra false positives are worth the tradeoff. For an internal documentation site, the calculus flips---false positive noise has more practical impact than the low-severity findings you might miss.

The right question isn't "which scanner has zero false positives?" It's "what false positive rate is acceptable for this codebase, and how do I triage the rest efficiently?"

Triage Workflow: A Step-by-Step Guide

A repeatable triage process turns scanner output from noise into actionable intelligence. Here's a decision framework you can adapt to your team.

Step 1: Severity Assessment

Start with the scanner's severity rating. Critical and high findings get immediate attention. Medium and low findings go into a batch review queue---don't let them interrupt flow.

Step 2: Reachability Analysis

Can the flagged code actually execute? Dead code, test fixtures, and unreachable branches produce findings that are technically correct but practically irrelevant. If the vulnerable function is only called from a test file, it's not a production risk.

Step 3: Context Evaluation

This is where human judgment matters most. Ask:

Does the code have input validation upstream that the scanner didn't track?
Is the flagged API endpoint behind authentication?
Does the framework provide built-in protections (e.g., Django's ORM prevents SQL injection by default)?

Step 4: Decide and Document

Every finding gets one of three outcomes:

Fix: Real vulnerability, fix it now or schedule it by severity
Suppress with justification: False positive, add a suppression comment explaining why
Escalate: Ambiguous, needs a second opinion or deeper investigation

The documentation step is critical. A suppression without justification becomes a mystery for the next developer. Always explain why a finding is false.


// nosemgrep: javascript.express.security.injection.sql-injection
// Justification: role is validated against allowlist on line 3.
// Only 'admin', 'user', 'viewer' can reach this query.
const users = await db.query(`SELECT * FROM users WHERE role = '${role}'`);

Reducing False Positives at the Source

Triage handles false positives after they appear. These strategies reduce how many appear in the first place.

Custom Rules and Framework-Aware Configuration

Most SAST tools ship with generic rules. Tailoring them to your stack eliminates entire categories of noise:

Disable irrelevant rules: If you don't use XML parsing, disable XXE rules
Add framework-specific context: Tell the scanner your ORM handles parameterized queries
Write custom rules: Encode your team's security patterns so the scanner recognizes them as safe

Baseline Scans

Run an initial scan, triage everything, and mark existing findings as your baseline. Future scans only surface new findings. This prevents inheriting hundreds of legacy findings that demoralize the team before they start.

Suppression Comments with Justification

Inline suppressions (// nosemgrep, // nolint, # nosec) are powerful but dangerous. A suppression without context is a time bomb---the next developer won't know if it's hiding a real vulnerability or a confirmed false positive.

Rule: every suppression comment must include a justification. Enforce this in code review.

Tuning Sensitivity per Directory

Not all code deserves the same scrutiny. Scan your authentication module at maximum sensitivity. Scan your admin dashboard scripts at a lower threshold. Most tools support per-directory or per-file configuration.

AI-Powered Triage

LLM-based tools are increasingly used to assess whether a scanner finding is real. The idea is simple: feed the finding and surrounding code to an AI model, ask it to evaluate exploitability, and use its assessment to prioritize.

Where AI Triage Works

AI excels at the context evaluation step that traditional scanners lack. It can read the surrounding code, understand that a validation function sanitizes input before it reaches the flagged sink, and correctly classify the finding as a false positive. For straightforward cases---allowlist validation, framework protections, dead code---AI triage is fast and accurate.

The AI-on-AI False Positive Chain

There's a subtle risk when using AI to triage findings from AI-generated code. The AI that wrote the code might have the same blind spots as the AI evaluating it. If both models share a misconception about how a particular API works, the triage model might confidently mark a real vulnerability as a false positive. This is why AI triage should augment human review, not replace it entirely.

How Rafter Approaches This

Rafter combines traditional static analysis with an AI-powered contextual review layer. When a SAST rule flags a finding, Rafter's AI layer evaluates the surrounding code context---checking for upstream validation, framework protections, and reachability---before surfacing the finding to you. Findings that the AI layer confirms as likely false positives are deprioritized rather than hidden, so you can still review them if needed. The goal is reducing the 70% investigation time waste without introducing the risk of suppressing real vulnerabilities.

For AI-generated code specifically, this matters more. AI coding tools produce code faster than humans can review it, and that code has documented vulnerability rates that make scanning essential. But scanning AI-generated code with traditional tools produces even more false positives than scanning human-written code, because AI output often uses patterns that trigger generic rules. Rafter's AI triage layer is tuned for these patterns, reducing noise while preserving signal.

Start scanning your AI-generated code at rafter.so.

Building a Sustainable Triage Culture

Tooling solves half the problem. The other half is process.

False Positive Triage SOP

Adopt this seven-step standard operating procedure:

Set severity SLAs: Critical findings triaged within 24 hours, high within 3 days, medium/low in weekly batch review
Rotate triage duty: Don't assign one person to review all findings. Rotate weekly across the team
Track false positive rates: Measure your scanner's false positive rate monthly. If it exceeds 50%, your rules need tuning
Require suppression justifications: Every inline suppression must explain why the finding is false
Review suppressions quarterly: Old suppressions may become stale if the code changes
Feed back to scanner config: Every false positive pattern you identify should become a rule tuning or custom rule
Celebrate true positives: When a scanner catches a real vulnerability, share it with the team. It reinforces why the triage work matters

Metrics That Matter

Track these to measure your triage health:

Metric	Target	Why It Matters
False positive rate	< 40%	Below this, developer trust stays intact
Mean time to triage	< 48 hours for critical	Ensures real vulnerabilities get fast attention
Suppression coverage	100% justified	Prevents hidden real vulnerabilities
Finding-to-fix rate	> 80% of true positives	Scanning without fixing is security theater

Conclusion

False positives are the tax you pay for automated security scanning. You can't eliminate them, but you can build systems that minimize their cost. The scanner that flags too much is still more valuable than no scanner at all---if you have a triage process that extracts signal from noise.

Your next steps:

Run your current SAST tool and measure your false positive rate---if you don't know the number, you can't improve it
Implement suppression comments with mandatory justifications in your codebase
Set up a baseline scan so you only triage new findings going forward
Establish severity-based SLAs for triage (24 hours for critical, 3 days for high, weekly for the rest)
Evaluate Rafter for AI-powered triage that reduces noise on AI-generated code

False positives aren't going away. But with the right workflow, they become a manageable cost of doing security well---not the reason your team stops doing it.