GPT-5.4 Mini & Nano — Near-Flagship Performance for Everyone

OpenAI released GPT-5.4 mini and nano on March 17, 2026 — the most capable small models yet, running 2x faster than predecessors and bringing near-flagship AI to free users and high-volume developer workloads.
The Right Tool for the Right Job
On March 17, 2026, OpenAI released GPT-5.4 mini and GPT-5.4 nano — the most capable small models in their lineup yet. These are purpose-built for a specific reality: in agentic AI systems, not every task needs the full weight of a flagship model. Some tasks need to be fast, cheap, and parallel. That's exactly what mini and nano are for.
What's New With Each Model
GPT-5.4 mini significantly improves over GPT-5 mini across coding, reasoning, multimodal understanding, and tool use — while running more than 2x faster. It approaches the performance of the full GPT-5.4 on several key evaluations including SWE-Bench Pro and OSWorld-Verified. On the GPQA Diamond benchmark, it scored 88.01% — trailing the flagship's 93% by a surprisingly small margin.
GPT-5.4 nano is the smallest, cheapest version for tasks where speed and cost matter most. OpenAI recommends it for classification, data extraction, ranking, and coding subagents handling simpler supporting tasks. At just $0.20 per million input tokens — versus $2.50 for the full GPT-5.4 — the cost efficiency is dramatic.
The Subagent Architecture Shift
The most significant conceptual shift isn't a benchmark number — it's the explicit framing around multi-model composition. In Codex, a larger model like GPT-5.4 handles planning, coordination, and final judgment, while delegating to GPT-5.4 mini subagents that handle narrower subtasks in parallel — searching a codebase, reviewing a large file, or processing supporting documents.
Instead of using one model for everything, developers can now compose systems where larger models decide what to do and smaller models execute quickly at scale. It's the same principle that made microservices architectures dominant in software engineering — applied to AI.
Built for Latency-Sensitive Workflows
GPT-5.4 mini and nano excel in environments where response speed directly shapes the product experience:
- Coding assistants that need to feel instantaneous during edits and debugging loops
- Computer use systems that interpret screenshots of dense UIs in real time
- Agentic pipelines running many parallel subtasks simultaneously
- Front-end generation and codebase navigation at high throughput
Availability & Pricing
GPT-5.4 mini is available to Free and Go tier users in ChatGPT via the "Thinking" feature. For paid users, it serves as a high-speed fallback when they reach GPT-5.4 Thinking rate limits. GPT-5.4 nano is available exclusively via the OpenAI API.
- GPT-5.4 mini: $0.80 / 1M input tokens, $3.20 / 1M output tokens
- GPT-5.4 nano: $0.20 / 1M input tokens, $0.80 / 1M output tokens
By comparison, the full GPT-5.4 is $2.50 / 1M input tokens — making nano roughly 12x cheaper for the right use cases.
What This Means for the Ecosystem
The release signals OpenAI's clear intent: the future of AI products isn't a single powerful model doing everything — it's a layered architecture where models of different sizes collaborate. Flagship models think. Mini models execute. Nano models process at scale. For developers, the question is no longer just "which model is smartest?" but "which combination delivers the best outcome per dollar per millisecond?"