Security Scanning at Scale: Performance and Architecture

Running a security scan on a 50-file prototype takes seconds. Running the same scan on a monorepo with 10,000 files, 200 dependencies, and a 15-minute CI/CD time budget is a fundamentally different engineering problem. At scale, scanning stops being a configuration step and becomes an architecture decision: incremental analysis to avoid rescanning unchanged code, parallelization to use available compute, caching to eliminate redundant work, and prioritization to surface critical findings before the pipeline times out.

If your security scan takes longer than your CI/CD time budget, developers will disable it. A scan that never runs provides zero security value regardless of its detection quality.

The Scale Problem

Scanning time grows with codebase size, but not linearly. Cross-file data flow analysis—the technique that catches the most dangerous vulnerabilities like SQL injection through multiple function calls—scales quadratically or worse with the number of files. A SAST tool that scans 100 files in 30 seconds might take 20 minutes on 5,000 files and over an hour on 20,000.

The constraint isn't theoretical. CI/CD pipelines typically run with time budgets between 5 and 15 minutes. Developers expect PR feedback within that window. When a security scan pushes the pipeline past 20 minutes, teams respond predictably: they move the scan to a nightly job, then start ignoring nightly results, then disable it entirely. Google's static analysis team documented this dynamic directly—their tools succeed at scale only because they enforce strict latency budgets, accepting reduced detection depth over losing developer engagement (Sadowski et al., "Lessons from Building Static Analysis Tools at Google," CACM, 2018).

The numbers are stark. A full CodeQL analysis on a large Java codebase can take 30-60 minutes. Semgrep's own benchmarks show sub-minute performance on most repositories, but their inter-file analysis mode increases scan time by 3-10x depending on codebase complexity. For teams shipping 50+ PRs per day, even a 10-minute scan creates a 500-minute daily compute burden—and that's before accounting for retry runs.

The engineering challenge is clear: you need the detection depth of full-codebase analysis within the time budget of a PR check.

Incremental Scanning

The single most effective optimization is not scanning code that hasn't changed. Incremental scanning analyzes only the files modified in a PR plus the files that depend on them, skipping everything else.

The mechanics vary by tool:

Semgrep supports --baseline-commit to compare against a target branch. It runs rules only on changed files and reports only new findings, suppressing anything that existed before the PR.
CodeQL builds a full database but supports diff-aware queries that filter results to changed lines. The database build itself is the bottleneck—CodeQL doesn't skip compilation of unchanged code.
ESLint security plugins naturally operate per-file, making them inherently incremental when your CI only lints changed files.

The tradeoff is real: incremental scanning can miss vulnerabilities that span the boundary between changed and unchanged code. If a PR modifies a function that's called by an unchanged file in an insecure way, the incremental scan won't flag it. Teams mitigate this with periodic full scans (nightly or weekly) that catch cross-boundary issues the incremental pass missed.


# Example: determining scan scope from git diff
import subprocess

def get_changed_files(base_branch="origin/main"):
    result = subprocess.run(
        ["git", "diff", "--name-only", "--diff-filter=ACMR", base_branch],
        capture_output=True, text=True
    )
    return [f for f in result.stdout.strip().split("\n") if f]

def get_dependent_files(changed_files, dependency_graph):
    """Expand scan scope to include files that import changed modules."""
    scan_scope = set(changed_files)
    for f in changed_files:
        scan_scope.update(dependency_graph.get(f, []))
    return scan_scope

This approach cuts scan time by 80-95% on typical PRs, where developers touch 5-20 files out of thousands.

Parallelization Strategies

Modern scanners exploit parallelism at three levels:

File-level parallelism is the simplest. Each file gets analyzed independently on a separate core. Semgrep uses this by default, distributing files across available CPUs with a Rust-based core that runs rules in parallel. For intra-file analysis (single-function bugs, pattern matches), this scales nearly linearly with core count.

Rule-level parallelism runs multiple detection rules against the same file simultaneously. A codebase might need 500+ rules covering SQL injection, XSS, hardcoded secrets, and insecure configurations. Running these serially means scanning the AST 500 times; running them in parallel means one AST parse with concurrent rule evaluation.

Distributed scanning splits work across multiple CI workers. For monorepos, this means partitioning the codebase by language, service boundary, or directory and scanning each partition on a separate machine. GitHub Actions supports this natively with matrix strategies:


# GitHub Actions: parallel scanning by language partition
name: Security Scan
on: [pull_request]

jobs:
  scan:
    strategy:
      matrix:
        partition: [frontend, backend, shared-libs]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Cache scan results
        uses: actions/cache@v4
        with:
          path: ~/.semgrep/cache
          key: semgrep-${{ matrix.partition }}-${{ hashFiles('**/*.lock') }}
          restore-keys: semgrep-${{ matrix.partition }}-

      - name: Run incremental SAST
        run: |
          semgrep ci \
            --baseline-commit ${{ github.event.pull_request.base.sha }} \
            --include="${{ matrix.partition }}/**" \
            --jobs $(nproc) \
            --timeout 300

      - name: Run SCA on changed lockfiles
        if: matrix.partition == 'backend'
        run: |
          semgrep ci --config=r/supply-chain \
            --baseline-commit ${{ github.event.pull_request.base.sha }}

Facebook's static analysis team reported that distributed scanning across their monorepo reduced wall-clock analysis time from hours to minutes, making it feasible to run on every diff (Calcagno et al., "Moving Fast with Software Verification," NFM 2015).

Caching and Baseline Management

Caching eliminates redundant computation between scans. Three layers matter:

AST/parse caching stores the parsed representation of files. If a file hasn't changed since the last scan, reuse the cached AST instead of re-parsing. This saves 20-40% of scan time on large codebases where most files are stable between runs.

Analysis result caching stores the findings for unchanged files. Combined with dependency tracking, this means only files with changed inputs (source or dependencies) get re-analyzed. Semgrep's CI mode uses this approach, caching results keyed by file hash and rule version.

Baseline management suppresses known findings so PR scans only show new issues. Without baselines, a legacy codebase with 500 existing findings buries the 2 new findings introduced by the current PR. Developers stop reading scan results entirely.

The baseline workflow:

Run a full scan on the main branch and store findings as the baseline
On each PR, run incrementally and compare against the baseline
Only report findings that are new (not present in the baseline)
Update the baseline after each merge to main

Caching Layer	What It Stores	Time Saved	Staleness Risk
AST/parse cache	Parsed file representations	20-40%	Low (invalidated by file changes)
Analysis results	Per-file findings	50-80%	Medium (rule updates require cache bust)
Baseline suppression	Known findings on main	N/A (reduces noise)	High (stale baselines hide regressions)

The staleness risk for baselines deserves attention. If the baseline isn't updated after every merge, findings fixed on main will keep being suppressed on future PRs. Automate baseline updates as a post-merge CI step.

Monorepo Considerations

Monorepos amplify every scaling problem. A single repository with 50 services, 3 languages, and shared libraries creates unique challenges for security scanning.

Path-based scanning lets you scope rules to specific directories. Security rules for your payment service (strict PCI-DSS compliance checks) don't need to run against your marketing site. Configure scan profiles per directory to reduce both scan time and false positive noise.

Service-boundary awareness matters for cross-service vulnerabilities. A shared authentication library change affects every service that imports it. Your scan configuration needs a dependency graph that expands the scan scope when shared code changes—miss this, and you'll ship authentication bugs to 50 services simultaneously.

Ownership routing directs findings to the right team. In a monorepo with 20 teams, a finding in services/payments/ should create a ticket for the payments team, not the platform team. Tools like Semgrep support CODEOWNERS-based routing, matching findings to the team that owns the affected code path.

Shared library scanning requires special treatment. Changes to libs/common/auth should trigger security scans across every service that depends on it—not just the library itself. Without this expansion, a vulnerability in shared code slips through because no individual service's diff looks dangerous.

Prioritization Under Time Pressure

When you can't scan everything within the time budget, scan the right things first. Not all code carries equal risk.

Prioritize by file risk. Authentication, payment processing, data access layers, and API endpoint handlers are higher-value targets than utility functions or configuration files. Weight scan time toward code that handles sensitive data or enforces security boundaries.

Prioritize by change recency. Files modified in the current PR are more likely to contain newly introduced vulnerabilities than files that haven't changed in months. Scan changed files first, then expand scope if time permits.

Prioritize by historical vulnerability density. Files that have had security findings before are statistically more likely to have them again. Track finding history per file and use it to rank scan priority. Google's Tricorder system uses exactly this heuristic—files with prior bug density get more analysis depth (Sadowski et al., CACM 2018).

Prioritize by rule severity. If you have 10 minutes, run critical-severity rules (injection, auth bypass, secrets exposure) before informational rules (code style, minor hardening). Most tools support severity-based filtering to enable this.

Practical Recommendations

Different pipeline stages serve different scanning purposes. Match the tool, scope, and time budget to each stage:

Pipeline Stage	Time Budget	Scope	Tools	What It Catches
Pre-commit	1-5 seconds	Staged files only	Secret scanners (Betterleaks), lint rules	Hardcoded secrets, obvious anti-patterns
PR check	5-15 minutes	Changed files + deps	Incremental SAST, SCA	New vulnerabilities, insecure dependencies
Nightly full scan	30-60 minutes	Entire codebase	Full SAST, comprehensive SCA	Cross-file issues, baseline drift
Quarterly	Hours-days	Running application	DAST, penetration testing	Runtime vulnerabilities, config issues

Scaling Your Scanning Without Breaking CI/CD

Enable incremental scanning on PR checks—scan only changed files and their dependents
Cache ASTs and analysis results between runs to eliminate redundant parsing
Establish a baseline on main and suppress known findings in PR scans
Parallelize scans across available cores and CI workers using matrix strategies
Route critical-severity rules first; defer informational rules to nightly scans
Configure path-based scan profiles for monorepos to reduce scope per service
Update baselines automatically after every merge to prevent suppression staleness

How Rafter Approaches Scale

Rafter is built for codebases where AI-generated code is the norm—high commit velocity, frequent dependency churn, and large volumes of generated files that need security review. Rafter combines Semgrep-based SAST with AI-powered code review and Trivy dependency scanning to catch injection vulnerabilities, exposed secrets, insecure dependencies, and the patterns that AI coding tools commonly introduce.

Rafter currently offers two scan modes: Fast scans for quick rule-based analysis, and Plus scans that add deeper AI-powered review. Scans run automatically on push or pull request via the GitHub App, or on-demand via CLI. Per-repo configuration lets teams control scan triggers, branch filters, and monthly scan caps for cost management.

For teams scanning at volume, the strategies in this post—incremental analysis, caching, and baseline management—represent where scanning tools need to go. Rafter is actively working toward incremental scanning, result caching, and baseline deduplication to keep scan times fast as codebases grow.

Try Rafter on your repos — connect GitHub, select repositories, and get results in under a minute.

Conclusion

Security scanning at scale is an engineering problem, not a configuration toggle. Full-codebase scans that take an hour provide strong detection but zero value if developers skip them. The solution is architectural: incremental analysis for PR-speed feedback, parallelization to use available compute, caching to eliminate redundant work, and prioritization to surface critical findings first.

Next steps:

Measure your current scan time and compare it against your CI/CD time budget
Enable incremental scanning for PR checks—most tools support diff-based mode
Set up baseline management to suppress known findings and surface only new ones
Distribute scans across CI workers for monorepos or multi-language codebases
Run full scans nightly to catch what incremental analysis misses