GPT-5.4 — OpenAI's Most Capable Frontier Model for Professional Work

OpenAI has released GPT-5.4, its most capable and efficient reasoning model yet — combining coding power, native computer use, 1M token context, and OpenAI's first built-in cybersecurity mitigations. Here's everything you need to know.
A New Frontier Model for Professional Work
On March 5, 2026, OpenAI released GPT-5.4 — described as "our most capable and efficient frontier model for professional work." It rolls out across ChatGPT, Codex, and the API simultaneously, with three variants: the standard GPT-5.4, a reasoning-focused GPT-5.4 Thinking, and a high-performance GPT-5.4 Pro for maximum output on complex tasks.
The model brings together the best of OpenAI's recent advances in reasoning, coding, and agentic workflows into a single system — incorporating the industry-leading coding capabilities of GPT-5.3 Codex while meaningfully improving how the model works across tools, software environments, and professional tasks involving spreadsheets, presentations, and documents.
What's Actually New
Several capabilities stand out as genuine advances over previous models:
- Native computer use in the API: GPT-5.4 is the first general-purpose OpenAI model released with native, state-of-the-art computer-use capabilities, enabling agents to operate computers and carry out complex workflows across applications. In testing across ~30K property and tax portals, GPT-5.4 achieved a 95% success rate on the first attempt and 100% within three attempts — compared to 73–79% with prior models — while completing sessions 3x faster and using 70% fewer tokens.
- 1M token context window: The API version supports up to 1 million tokens, allowing agents to plan, execute, and verify tasks across long horizons — the largest context window available from OpenAI.
- Tool search: A new mechanism that gives the model a lightweight list of available tools rather than loading all definitions upfront. For systems with hundreds of tools, this can eliminate tens of thousands of unnecessary tokens per request, reducing cost and improving response speed.
- Transparent thinking with GPT-5.4 Thinking: The model can now provide an upfront plan of its reasoning before executing, so users can adjust course mid-response. It can also think longer on difficult tasks while maintaining stronger awareness of earlier conversation steps.
- Enhanced visual understanding: A new "original" image input detail level supports full-fidelity perception up to 10.24M total pixels, with strong gains in localization ability and click accuracy — directly relevant to computer use.
- Token efficiency: GPT-5.4 uses significantly fewer tokens than GPT-5.2 to solve the same problems, translating directly into lower API costs for developers.
Benchmark Results That Matter
GPT-5.4 sets new records on several benchmarks that reflect real professional work:
- OSWorld-Verified: Record score for computer use tasks
- WebArena Verified: Record score for web navigation agents
- GDPval: 83% — a test for knowledge work tasks
- APEX-Agents: Top performance on professional skills benchmarks in law and finance
- SWE-Bench: 80.8% — measuring real-world software engineering capability
The Safety Story: A First for Cybersecurity
Alongside the product launch, OpenAI published the GPT-5.4 Thinking System Card — and the headline finding is significant: GPT-5.4 Thinking is the first general-purpose OpenAI model to have implemented active mitigations for High capability in Cybersecurity under OpenAI's Preparedness Framework.
Previous models flagged cybersecurity capability as a risk area. GPT-5.4 is the first where OpenAI actually built the guardrails in. These include an expanded cyber safety stack: monitoring systems, trusted access controls, and asynchronous blocking for higher-risk requests on Zero Data Retention (ZDR) surfaces.
Why does this matter now? Because GPT-5.4 scores 75% on OSWorld — above human performance — and 80.8% on SWE-Bench. A model that can autonomously operate computers and write production-quality code has genuine offensive cyber potential. Acknowledging this openly and deploying active mitigations is a meaningful shift from the "we assessed the risk" approach of prior model releases.
Chain-of-Thought Transparency
The system card also introduces a new open-source evaluation: CoT controllability — measuring whether models can deliberately obfuscate their reasoning to evade monitoring. The finding is reassuring: GPT-5.4 Thinking's ability to control its chain-of-thought is low, meaning the model cannot effectively hide its reasoning. OpenAI frames this as a positive safety property: CoT monitoring remains a reliable oversight tool.
The model also shows statistically significant improvements on evaluations for illicit non-violent activity and self-harm content, and outperforms all previous GPT-5 series models on mental health evaluations.
Availability
GPT-5.4 Thinking is available starting today to ChatGPT Plus, Team, and Pro users, replacing GPT-5.2 Thinking as the default reasoning model. GPT-5.2 Thinking will remain available in the model picker under Legacy Models for three months, before being retired on June 5, 2026. In the API, GPT-5.4 is available now as gpt-5.4, with GPT-5.4 Pro available as gpt-5.4-pro.
The Bigger Picture
GPT-5.4 is the model where OpenAI stops talking about agents as a future capability and ships them as a present one. Native computer use, 1M token context, tool search, and transparent reasoning together form the infrastructure for real agentic workflows — not demos, but production systems. The safety story is equally important: as these models become genuinely capable of operating autonomously, the seriousness of the safety investment has to scale with the capability. GPT-5.4 represents OpenAI's clearest signal yet that it understands this equation.
Sources:
Introducing GPT-5.4 — OpenAI
GPT-5.4 Thinking System Card — OpenAI