Security Scanning at Scale: Performance and Architecture

Written by the Rafter Team

Running a security scan on a 50-file prototype takes seconds. Running the same scan on a monorepo with 10,000 files, 200 dependencies, and a 15-minute CI/CD time budget is a fundamentally different engineering problem. At scale, scanning stops being a configuration step and becomes an architecture decision: incremental analysis to avoid rescanning unchanged code, parallelization to use available compute, caching to eliminate redundant work, and prioritization to surface critical findings before the pipeline times out.
If your security scan takes longer than your CI/CD time budget, developers will disable it. A scan that never runs provides zero security value regardless of its detection quality.
The Scale Problem
Scanning time grows with codebase size, but not linearly. Cross-file data flow analysis—the technique that catches the most dangerous vulnerabilities like SQL injection through multiple function calls—scales quadratically or worse with the number of files. A SAST tool that scans 100 files in 30 seconds might take 20 minutes on 5,000 files and over an hour on 20,000.
The constraint isn't theoretical. CI/CD pipelines typically run with time budgets between 5 and 15 minutes. Developers expect PR feedback within that window. When a security scan pushes the pipeline past 20 minutes, teams respond predictably: they move the scan to a nightly job, then start ignoring nightly results, then disable it entirely. Google's static analysis team documented this dynamic directly—their tools succeed at scale only because they enforce strict latency budgets, accepting reduced detection depth over losing developer engagement (Sadowski et al., "Lessons from Building Static Analysis Tools at Google," CACM, 2018).
The numbers are stark. A full CodeQL analysis on a large Java codebase can take 30-60 minutes. Semgrep's own benchmarks show sub-minute performance on most repositories, but their inter-file analysis mode increases scan time by 3-10x depending on codebase complexity. For teams shipping 50+ PRs per day, even a 10-minute scan creates a 500-minute daily compute burden—and that's before accounting for retry runs.
The engineering challenge is clear: you need the detection depth of full-codebase analysis within the time budget of a PR check.
Incremental Scanning
The single most effective optimization is not scanning code that hasn't changed. Incremental scanning analyzes only the files modified in a PR plus the files that depend on them, skipping everything else.
The mechanics vary by tool:
- Semgrep supports
--baseline-committo compare against a target branch. It runs rules only on changed files and reports only new findings, suppressing anything that existed before the PR. - CodeQL builds a full database but supports diff-aware queries that filter results to changed lines. The database build itself is the bottleneck—CodeQL doesn't skip compilation of unchanged code.
- ESLint security plugins naturally operate per-file, making them inherently incremental when your CI only lints changed files.
The tradeoff is real: incremental scanning can miss vulnerabilities that span the boundary between changed and unchanged code. If a PR modifies a function that's called by an unchanged file in an insecure way, the incremental scan won't flag it. Teams mitigate this with periodic full scans (nightly or weekly) that catch cross-boundary issues the incremental pass missed.
# Example: determining scan scope from git diff
import subprocess
def get_changed_files(base_branch="origin/main"):
result = subprocess.run(
["git", "diff", "--name-only", "--diff-filter=ACMR", base_branch],
capture_output=True, text=True
)
return [f for f in result.stdout.strip().split("\n") if f]
def get_dependent_files(changed_files, dependency_graph):
"""Expand scan scope to include files that import changed modules."""
scan_scope = set(changed_files)
for f in changed_files:
scan_scope.update(dependency_graph.get(f, []))
return scan_scope
This approach cuts scan time by 80-95% on typical PRs, where developers touch 5-20 files out of thousands.
Parallelization Strategies
Modern scanners exploit parallelism at three levels:
File-level parallelism is the simplest. Each file gets analyzed independently on a separate core. Semgrep uses this by default, distributing files across available CPUs with a Rust-based core that runs rules in parallel. For intra-file analysis (single-function bugs, pattern matches), this scales nearly linearly with core count.
Rule-level parallelism runs multiple detection rules against the same file simultaneously. A codebase might need 500+ rules covering SQL injection, XSS, hardcoded secrets, and insecure configurations. Running these serially means scanning the AST 500 times; running them in parallel means one AST parse with concurrent rule evaluation.
Distributed scanning splits work across multiple CI workers. For monorepos, this means partitioning the codebase by language, service boundary, or directory and scanning each partition on a separate machine. GitHub Actions supports this natively with matrix strategies:
# GitHub Actions: parallel scanning by language partition
name: Security Scan
on: [pull_request]
jobs:
scan:
strategy:
matrix:
partition: [frontend, backend, shared-libs]
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Cache scan results
uses: actions/cache@v4
with:
path: ~/.semgrep/cache
key: semgrep-${{ matrix.partition }}-${{ hashFiles('**/*.lock') }}
restore-keys: semgrep-${{ matrix.partition }}-
- name: Run incremental SAST
run: |
semgrep ci \
--baseline-commit ${{ github.event.pull_request.base.sha }} \
--include="${{ matrix.partition }}/**" \
--jobs $(nproc) \
--timeout 300
- name: Run SCA on changed lockfiles
if: matrix.partition == 'backend'
run: |
semgrep ci --config=r/supply-chain \
--baseline-commit ${{ github.event.pull_request.base.sha }}
Facebook's static analysis team reported that distributed scanning across their monorepo reduced wall-clock analysis time from hours to minutes, making it feasible to run on every diff (Calcagno et al., "Moving Fast with Software Verification," NFM 2015).
Caching and Baseline Management
Caching eliminates redundant computation between scans. Three layers matter:
AST/parse caching stores the parsed representation of files. If a file hasn't changed since the last scan, reuse the cached AST instead of re-parsing. This saves 20-40% of scan time on large codebases where most files are stable between runs.
Analysis result caching stores the findings for unchanged files. Combined with dependency tracking, this means only files with changed inputs (source or dependencies) get re-analyzed. Semgrep's CI mode uses this approach, caching results keyed by file hash and rule version.
Baseline management suppresses known findings so PR scans only show new issues. Without baselines, a legacy codebase with 500 existing findings buries the 2 new findings introduced by the current PR. Developers stop reading scan results entirely.
The baseline workflow:
- Run a full scan on the main branch and store findings as the baseline
- On each PR, run incrementally and compare against the baseline
- Only report findings that are new (not present in the baseline)
- Update the baseline after each merge to main
| Caching Layer | What It Stores | Time Saved | Staleness Risk |
|---|---|---|---|
| AST/parse cache | Parsed file representations | 20-40% | Low (invalidated by file changes) |
| Analysis results | Per-file findings | 50-80% | Medium (rule updates require cache bust) |
| Baseline suppression | Known findings on main | N/A (reduces noise) | High (stale baselines hide regressions) |
The staleness risk for baselines deserves attention. If the baseline isn't updated after every merge, findings fixed on main will keep being suppressed on future PRs. Automate baseline updates as a post-merge CI step.
Monorepo Considerations
Monorepos amplify every scaling problem. A single repository with 50 services, 3 languages, and shared libraries creates unique challenges for security scanning.
Path-based scanning lets you scope rules to specific directories. Security rules for your payment service (strict PCI-DSS compliance checks) don't need to run against your marketing site. Configure scan profiles per directory to reduce both scan time and false positive noise.
Service-boundary awareness matters for cross-service vulnerabilities. A shared authentication library change affects every service that imports it. Your scan configuration needs a dependency graph that expands the scan scope when shared code changes—miss this, and you'll ship authentication bugs to 50 services simultaneously.
Ownership routing directs findings to the right team. In a monorepo with 20 teams, a finding in services/payments/ should create a ticket for the payments team, not the platform team. Tools like Semgrep support CODEOWNERS-based routing, matching findings to the team that owns the affected code path.
Shared library scanning requires special treatment. Changes to libs/common/auth should trigger security scans across every service that depends on it—not just the library itself. Without this expansion, a vulnerability in shared code slips through because no individual service's diff looks dangerous.
Prioritization Under Time Pressure
When you can't scan everything within the time budget, scan the right things first. Not all code carries equal risk.
Prioritize by file risk. Authentication, payment processing, data access layers, and API endpoint handlers are higher-value targets than utility functions or configuration files. Weight scan time toward code that handles sensitive data or enforces security boundaries.
Prioritize by change recency. Files modified in the current PR are more likely to contain newly introduced vulnerabilities than files that haven't changed in months. Scan changed files first, then expand scope if time permits.
Prioritize by historical vulnerability density. Files that have had security findings before are statistically more likely to have them again. Track finding history per file and use it to rank scan priority. Google's Tricorder system uses exactly this heuristic—files with prior bug density get more analysis depth (Sadowski et al., CACM 2018).
Prioritize by rule severity. If you have 10 minutes, run critical-severity rules (injection, auth bypass, secrets exposure) before informational rules (code style, minor hardening). Most tools support severity-based filtering to enable this.
Practical Recommendations
Different pipeline stages serve different scanning purposes. Match the tool, scope, and time budget to each stage:
| Pipeline Stage | Time Budget | Scope | Tools | What It Catches |
|---|---|---|---|---|
| Pre-commit | 1-5 seconds | Staged files only | Secret scanners (gitleaks), lint rules | Hardcoded secrets, obvious anti-patterns |
| PR check | 5-15 minutes | Changed files + deps | Incremental SAST, SCA | New vulnerabilities, insecure dependencies |
| Nightly full scan | 30-60 minutes | Entire codebase | Full SAST, comprehensive SCA | Cross-file issues, baseline drift |
| Quarterly | Hours-days | Running application | DAST, penetration testing | Runtime vulnerabilities, config issues |
Scaling Your Scanning Without Breaking CI/CD
- Enable incremental scanning on PR checks—scan only changed files and their dependents
- Cache ASTs and analysis results between runs to eliminate redundant parsing
- Establish a baseline on main and suppress known findings in PR scans
- Parallelize scans across available cores and CI workers using matrix strategies
- Route critical-severity rules first; defer informational rules to nightly scans
- Configure path-based scan profiles for monorepos to reduce scope per service
- Update baselines automatically after every merge to prevent suppression staleness
How Rafter Approaches Scale
Rafter is built for codebases where AI-generated code is the norm—high commit velocity, frequent dependency churn, and large volumes of generated files that need security review. Rafter combines Semgrep-based SAST with AI-powered code review and Trivy dependency scanning to catch injection vulnerabilities, exposed secrets, insecure dependencies, and the patterns that AI coding tools commonly introduce.
Rafter currently offers two scan modes: Fast scans for quick rule-based analysis, and Plus scans that add deeper AI-powered review. Scans run automatically on push or pull request via the GitHub App, or on-demand via CLI. Per-repo configuration lets teams control scan triggers, branch filters, and monthly scan caps for cost management.
For teams scanning at volume, the strategies in this post—incremental analysis, caching, and baseline management—represent where scanning tools need to go. Rafter is actively working toward incremental scanning, result caching, and baseline deduplication to keep scan times fast as codebases grow.
Try Rafter on your repos — connect GitHub, select repositories, and get results in under a minute.
Conclusion
Security scanning at scale is an engineering problem, not a configuration toggle. Full-codebase scans that take an hour provide strong detection but zero value if developers skip them. The solution is architectural: incremental analysis for PR-speed feedback, parallelization to use available compute, caching to eliminate redundant work, and prioritization to surface critical findings first.
Next steps:
- Measure your current scan time and compare it against your CI/CD time budget
- Enable incremental scanning for PR checks—most tools support diff-based mode
- Set up baseline management to suppress known findings and surface only new ones
- Distribute scans across CI workers for monorepos or multi-language codebases
- Run full scans nightly to catch what incremental analysis misses