OpenAI6 min read

Why Codex Security Skips SAST — And Why That's the Right Call

By AI Guide News·Monday, March 16, 2026

OpenAI's Codex Security deliberately avoids starting with a SAST report. The reason reveals something important about how AI agents reason about code — and why the hardest vulnerabilities were never a dataflow problem.

[AD] Rectangle 300×250 / In-article

The Design Choice That Defines Codex Security

For decades, Static Application Security Testing (SAST) has been one of the most effective ways security teams scale code review. But when OpenAI built Codex Security, they made a deliberate decision: don't start by importing a static analysis report and asking the agent to triage it. Start from the repository itself — its architecture, trust boundaries, and intended behavior — and validate what you find before asking a human to spend time on it.

That choice isn't arbitrary. It reflects a deeper insight about where the hardest security bugs actually live.

Why SAST Alone Isn't Enough

SAST tools are excellent at what they're designed for: enforcing secure coding standards, catching straightforward source-to-sink issues, and detecting known patterns at scale. But they have a structural limitation. In practice, SAST has to make approximations to stay tractable at scale — especially in real codebases with indirection, dynamic dispatch, callbacks, reflection, and framework-heavy control flow.

The deeper issue is what happens after you successfully trace a source to a sink. Even when static analysis correctly traces input across multiple functions and layers, it still has to answer the hardest question: does the security check in the code actually guarantee the property the system relies on?

Take a common pattern: code calls sanitize_html() before rendering untrusted content. A static analyzer can see that the sanitizer ran. What it usually can't determine is whether that sanitizer is actually sufficient for the specific rendering context, template engine, encoding behavior, and downstream transformations involved. That gap is where real vulnerabilities hide.

Three Reasons Codex Security Doesn't Start With SAST

Anchoring bias: If the agent starts from a SAST list, it inherits the list's blind spots. Vulnerabilities SAST structurally can't find — authorization gaps, workflow bypasses, state-related issues — simply won't appear on the starting list.
False positive inheritance: SAST tools are notorious for noisy output. Starting with their findings means starting with their noise. Codex Security's approach cuts noise by 84% and reduces false positives by over 50% — which only happens when you validate independently.
Measurement integrity: If the pipeline starts with SAST output, it becomes difficult to separate what the agent discovered through its own analysis from what it inherited from another tool. That separation matters for the system to improve over time.

What Codex Security Does Instead

Codex Security begins where security research begins: from the code and the system's intent. The workflow follows three stages:

Build system context: It analyzes the repository to understand architecture, trust boundaries, attacker entry points, sensitive data flows, and high-impact code paths — generating an editable, project-specific threat model.
Discover and reason: Using that threat model, it explores realistic attack paths. When it encounters a boundary that looks like "validation" or "sanitization," it doesn't treat that as a checkbox — it tries to falsify the guarantee the code is trying to make. Notably, it doesn't automatically trust code comments, so adding // this is not a bug above vulnerable code won't fool it.
Validate in isolation: Before surfacing a finding, Codex Security attempts to reproduce it in an isolated sandbox — capturing execution details and proof-of-concept artifacts. Only validated findings reach the developer.

The Numbers Speak For Themselves

During its private beta, Codex Security scanned over 1.2 million commits across external repositories, surfacing 792 critical findings and 10,561 high-severity findings — including vulnerabilities in OpenSSH, GnuTLS, GOGS, Thorium, libssh, PHP, and Chromium, resulting in 14 assigned CVEs. False positive rates fell by more than 50%. Over-reported severity dropped by more than 90%. In one repository, noise was cut by 84% since initial rollout.

These aren't marginal improvements. They're the difference between a tool security teams actually use and one they learn to ignore.

The Bigger Picture

This isn't OpenAI dismissing SAST — it's a precise argument about where to start an agentic security workflow. SAST tools remain valuable for enforcing coding standards and catching well-known patterns at scale. Codex Security is designed for something different: finding the bugs that cost security teams the most time precisely because they look safe until someone actually tries to break them.

The implication is significant. As AI agents get better at security research, the question shifts from "can AI find vulnerabilities?" to "what kind of reasoning does it take to find the ones that matter?" Codex Security's answer — start from intent, validate before surfacing, never trust appearances — is a template for how that reasoning should work.

Source: openai.com — Why Codex Security Doesn't Include a SAST Report

openaicodexsecuritysastvulnerabilityappsecai-security