False Positives in Security Scanning: A Triage Guide for Developers

Written by the Rafter Team

The average SAST tool produces between 30% and 70% false positives depending on the codebase and configuration. A 2025 study by Ghost Security scanning public GitHub repositories found that over 91% of flagged vulnerabilities across Go, Python, and PHP projects were false positives. That means for every 10 findings your scanner reports, anywhere from 3 to 9 of them are wrong. This isn't a bug in your tooling. It's a fundamental tradeoff baked into how static analysis works. Scanners that reduce false positives also reduce true positives. The real skill isn't finding a scanner with zero false positives---it doesn't exist. It's building a triage workflow that separates signal from noise without burning out your team.
Alert fatigue is a security risk. Research shows that 70% of security team time is spent investigating false positives, and 33% of organizations have been late responding to real attacks because teams were occupied with phantom threats. A triage workflow isn't optional---it's a security control.
Why False Positives Happen
False positives aren't scanner failures. They're the predictable result of tools that must make decisions without complete information. Understanding the mechanics helps you predict which findings to trust and which to scrutinize.
Over-Approximation in Static Analysis
Static analysis tools don't run your code. They reason about every possible execution path, including paths that can never actually execute. When a SAST tool encounters a branch like if (userIsAdmin && debugMode), it analyzes both sides---even if debugMode is always false in production. This is called over-approximation: the tool assumes more paths are reachable than actually are, which inflates the finding count.
The alternative---under-approximation---would miss real bugs. Scanner designers choose to over-report rather than under-report because a missed vulnerability (false negative) can lead to a breach, while a false positive only wastes investigation time. It's a deliberate engineering decision.
Missing Runtime Context
Static analysis sees source code, not running applications. It can't evaluate environment variables, database state, feature flags, or request context. Consider this Express.js route:
// ✗ SAST flags this as SQL injection
app.get('/users', async (req, res) => {
const role = req.query.role;
const users = await db.query(`SELECT * FROM users WHERE role = '${role}'`);
res.json(users);
});
A SAST tool correctly flags this as SQL injection---user input flows directly into a query string. But now consider this version:
// ✓ SAST may still flag this (false positive)
app.get('/users', async (req, res) => {
const role = req.query.role;
if (!['admin', 'user', 'viewer'].includes(role)) {
return res.status(400).json({ error: 'Invalid role' });
}
const users = await db.query(`SELECT * FROM users WHERE role = '${role}'`);
res.json(users);
});
The allowlist validation makes injection impossible. But many SAST tools still flag it because they don't track the constraint through the conditional logic. The taint source (req.query.role) still reaches the sink (db.query), and the tool can't prove the allowlist is sufficient.
Generic Rules Applied to Specific Frameworks
Scanner rule libraries are designed to work across many codebases. A rule that flags eval() in JavaScript is correct in general---but inside a build tool like Webpack or a template engine's internals, eval usage is intentional and sandboxed. Framework-specific context that would immediately tell a human "this is fine" is invisible to a generic rule.
AI Scanner Hallucinations
AI-powered scanners introduce a new class of false positives: hallucinated vulnerabilities. An LLM reviewing code might "see" a vulnerability pattern that doesn't actually exist, confidently explaining a SQL injection in code that uses parameterized queries throughout. Unlike traditional SAST false positives (which follow deterministic rules), AI false positives are non-deterministic---the same code might produce different findings on repeated scans.
The Cost of False Positives
False positives aren't free. They extract a measurable cost from your engineering organization.
Developer Trust Erosion
When developers encounter five false positives in a row, they stop trusting the scanner. The sixth finding---which might be a real SQL injection---gets the same dismissive treatment. This is the core danger: false positives don't just waste time, they actively undermine the scanner's ability to catch real vulnerabilities.
Alert Fatigue by the Numbers
The numbers paint a stark picture:
| Metric | Value | Source |
|---|---|---|
| Security team time on false positives | 70% | Contrast Security, 2024 |
| Organizations late to real attacks due to false positive overload | 33% | Contrast Security, 2024 |
| SAST false positive rate (public repos, multi-language) | 91%+ | Ghost Security, 2025 |
| NIST finding: Java SAST false positive rate | Up to 78% | NIST SATE Reports |
Slowed CI/CD Pipelines
Teams that gate deployments on zero scanner findings (without triage) either slow their release cycle to a crawl or eventually disable the gate entirely. Both outcomes are worse than having a triage process that acknowledges false positives exist and handles them systematically.
False Positives vs False Negatives: The Tradeoff
Every scanner makes a tradeoff between precision (what percentage of reported findings are real) and recall (what percentage of real vulnerabilities are found). You can't maximize both.
| Scanner Tuning | Precision | Recall | Risk |
|---|---|---|---|
| High sensitivity (aggressive) | Low---many false positives | High---catches most real bugs | Alert fatigue, wasted developer time |
| Low sensitivity (conservative) | High---most findings are real | Low---misses real vulnerabilities | False sense of security |
| Balanced (default) | Medium | Medium | Best starting point for most teams |
What's worse---a false positive or a false negative? It depends on context. For a banking application handling wire transfers, a false negative (missed vulnerability) could mean millions in losses. A few extra false positives are worth the tradeoff. For an internal documentation site, the calculus flips---false positive noise has more practical impact than the low-severity findings you might miss.
The right question isn't "which scanner has zero false positives?" It's "what false positive rate is acceptable for this codebase, and how do I triage the rest efficiently?"
Triage Workflow: A Step-by-Step Guide
A repeatable triage process turns scanner output from noise into actionable intelligence. Here's a decision framework you can adapt to your team.
Step 1: Severity Assessment
Start with the scanner's severity rating. Critical and high findings get immediate attention. Medium and low findings go into a batch review queue---don't let them interrupt flow.
Step 2: Reachability Analysis
Can the flagged code actually execute? Dead code, test fixtures, and unreachable branches produce findings that are technically correct but practically irrelevant. If the vulnerable function is only called from a test file, it's not a production risk.
Step 3: Context Evaluation
This is where human judgment matters most. Ask:
- Does the code have input validation upstream that the scanner didn't track?
- Is the flagged API endpoint behind authentication?
- Does the framework provide built-in protections (e.g., Django's ORM prevents SQL injection by default)?
Step 4: Decide and Document
Every finding gets one of three outcomes:
- Fix: Real vulnerability, fix it now or schedule it by severity
- Suppress with justification: False positive, add a suppression comment explaining why
- Escalate: Ambiguous, needs a second opinion or deeper investigation
The documentation step is critical. A suppression without justification becomes a mystery for the next developer. Always explain why a finding is false.
// nosemgrep: javascript.express.security.injection.sql-injection
// Justification: role is validated against allowlist on line 3.
// Only 'admin', 'user', 'viewer' can reach this query.
const users = await db.query(`SELECT * FROM users WHERE role = '${role}'`);
Reducing False Positives at the Source
Triage handles false positives after they appear. These strategies reduce how many appear in the first place.
Custom Rules and Framework-Aware Configuration
Most SAST tools ship with generic rules. Tailoring them to your stack eliminates entire categories of noise:
- Disable irrelevant rules: If you don't use XML parsing, disable XXE rules
- Add framework-specific context: Tell the scanner your ORM handles parameterized queries
- Write custom rules: Encode your team's security patterns so the scanner recognizes them as safe
Baseline Scans
Run an initial scan, triage everything, and mark existing findings as your baseline. Future scans only surface new findings. This prevents inheriting hundreds of legacy findings that demoralize the team before they start.
Suppression Comments with Justification
Inline suppressions (// nosemgrep, // nolint, # nosec) are powerful but dangerous. A suppression without context is a time bomb---the next developer won't know if it's hiding a real vulnerability or a confirmed false positive.
Rule: every suppression comment must include a justification. Enforce this in code review.
Tuning Sensitivity per Directory
Not all code deserves the same scrutiny. Scan your authentication module at maximum sensitivity. Scan your admin dashboard scripts at a lower threshold. Most tools support per-directory or per-file configuration.
AI-Powered Triage
LLM-based tools are increasingly used to assess whether a scanner finding is real. The idea is simple: feed the finding and surrounding code to an AI model, ask it to evaluate exploitability, and use its assessment to prioritize.
Where AI Triage Works
AI excels at the context evaluation step that traditional scanners lack. It can read the surrounding code, understand that a validation function sanitizes input before it reaches the flagged sink, and correctly classify the finding as a false positive. For straightforward cases---allowlist validation, framework protections, dead code---AI triage is fast and accurate.
The AI-on-AI False Positive Chain
There's a subtle risk when using AI to triage findings from AI-generated code. The AI that wrote the code might have the same blind spots as the AI evaluating it. If both models share a misconception about how a particular API works, the triage model might confidently mark a real vulnerability as a false positive. This is why AI triage should augment human review, not replace it entirely.
How Rafter Approaches This
Rafter combines traditional static analysis with an AI-powered contextual review layer. When a SAST rule flags a finding, Rafter's AI layer evaluates the surrounding code context---checking for upstream validation, framework protections, and reachability---before surfacing the finding to you. Findings that the AI layer confirms as likely false positives are deprioritized rather than hidden, so you can still review them if needed. The goal is reducing the 70% investigation time waste without introducing the risk of suppressing real vulnerabilities.
For AI-generated code specifically, this matters more. AI coding tools produce code faster than humans can review it, and that code has documented vulnerability rates that make scanning essential. But scanning AI-generated code with traditional tools produces even more false positives than scanning human-written code, because AI output often uses patterns that trigger generic rules. Rafter's AI triage layer is tuned for these patterns, reducing noise while preserving signal.
Start scanning your AI-generated code at rafter.so.
Building a Sustainable Triage Culture
Tooling solves half the problem. The other half is process.
False Positive Triage SOP
Adopt this seven-step standard operating procedure:
- Set severity SLAs: Critical findings triaged within 24 hours, high within 3 days, medium/low in weekly batch review
- Rotate triage duty: Don't assign one person to review all findings. Rotate weekly across the team
- Track false positive rates: Measure your scanner's false positive rate monthly. If it exceeds 50%, your rules need tuning
- Require suppression justifications: Every inline suppression must explain why the finding is false
- Review suppressions quarterly: Old suppressions may become stale if the code changes
- Feed back to scanner config: Every false positive pattern you identify should become a rule tuning or custom rule
- Celebrate true positives: When a scanner catches a real vulnerability, share it with the team. It reinforces why the triage work matters
Metrics That Matter
Track these to measure your triage health:
| Metric | Target | Why It Matters |
|---|---|---|
| False positive rate | < 40% | Below this, developer trust stays intact |
| Mean time to triage | < 48 hours for critical | Ensures real vulnerabilities get fast attention |
| Suppression coverage | 100% justified | Prevents hidden real vulnerabilities |
| Finding-to-fix rate | > 80% of true positives | Scanning without fixing is security theater |
Conclusion
False positives are the tax you pay for automated security scanning. You can't eliminate them, but you can build systems that minimize their cost. The scanner that flags too much is still more valuable than no scanner at all---if you have a triage process that extracts signal from noise.
Your next steps:
- Run your current SAST tool and measure your false positive rate---if you don't know the number, you can't improve it
- Implement suppression comments with mandatory justifications in your codebase
- Set up a baseline scan so you only triage new findings going forward
- Establish severity-based SLAs for triage (24 hours for critical, 3 days for high, weekly for the rest)
- Evaluate Rafter for AI-powered triage that reduces noise on AI-generated code
False positives aren't going away. But with the right workflow, they become a manageable cost of doing security well---not the reason your team stops doing it.
Related Resources
- SAST vs DAST vs SCA: Which Security Scanning Approach Do You Actually Need?
- SAST vs DAST vs SCA: What Each Scanner Catches
- How Static Analysis Finds Vulnerabilities
- Security Scanning Limits: What No Tool Can Catch
- From Scan to Fix: Closing the Remediation Loop
- Automated Security Scanning: Set Up CI/CD Protection
- Securing AI-Generated Code: Best Practices