GPT-5.3-Codex-Spark — Speed Is Now a Feature, Not a Trade-off

OpenAI has launched GPT-5.3-Codex-Spark, a lightweight agentic coding model running at over 1,000 tokens/second on Cerebras wafer-scale chips. It's the first AI coding tool built around real-time responsiveness — and it changes what interactive development can feel like.
Two Speeds, One Codex
Until now, agentic coding tools were built around one operating mode: you give the AI a task, step back, and wait — sometimes for minutes, sometimes longer. OpenAI's Codex was already exceptional at that kind of deep, autonomous work. But a different type of programming need was going unaddressed: the moment-to-moment, interactive cycle of writing, seeing, adjusting, and iterating that defines how developers actually think through a problem.
GPT-5.3-Codex-Spark is OpenAI's answer to that gap. It is a lighter, faster variant of GPT-5.3-Codex, designed specifically for real-time interaction within the Codex environment. With Codex-Spark, Codex now supports both long-running ambitious tasks and getting work done in the moment — two speeds within a single tool.
The Hardware Behind the Speed
To power Codex-Spark's rapid inference, OpenAI brought in a dedicated chip from its hardware partner Cerebras — marking a new level of integration in the company's physical infrastructure. The partnership between Cerebras and OpenAI was announced last month, when OpenAI said it had reached a multi-year agreement with the firm worth over $10 billion.
Powered by the Cerebras Wafer-Scale Engine, Codex-Spark runs at over 1,000 tokens per second, enabling near-instant feedback in live coding environments. To put that in context: most frontier models today operate at a fraction of that speed. The difference isn't incremental — at 1,000+ tokens/second, latency effectively disappears from the user experience.
The Cerebras Wafer-Scale Engine features the largest on-chip memory of any AI processor, enabling high-speed inference at thousands of tokens per second per user. The architecture scales out to thousands of systems, extending fast memory capacity into the multi-terabyte domain to support trillion-parameter models. OpenAI expects to bring this ultra-fast inference capability to its largest frontier models in 2026.
What It's Actually Built For
Codex-Spark is optimized for interactive work where latency matters as much as intelligence. You can collaborate with the model in real time, interrupting or redirecting it as it works, and rapidly iterate with near-instant responses. This is a fundamentally different interaction model from traditional AI coding assistants — less like delegating to an autonomous agent, more like pairing with someone who thinks at the speed of a conversation.
GPT-5.3-Codex-Spark is designed to be a "daily productivity driver" and primarily for rapid prototyping — which means it can handle more than just basic coding tasks. Making targeted edits, reshaping logic, refining interfaces and seeing results immediately — these are the use cases Spark was built around.
Specs at Launch
At its research preview launch, Codex-Spark ships with:
- 128k context window — sufficient for large codebases and full project context
- Text-only input — multimodal support is not included at this stage
- Separate rate limits — usage does not count against standard Codex rate limits during the preview
- Research preview status — access may be limited or queued during high-demand periods as OpenAI balances reliability
The Bigger Picture: A Compute Strategy, Not Just a Model
Codex-Spark is notable not just as a model release but as a signal of where OpenAI's hardware strategy is heading. Rather than relying entirely on general-purpose GPUs, OpenAI is beginning to integrate purpose-built silicon — in this case, Cerebras wafer-scale chips — directly into its inference stack for specific use cases where latency is the primary constraint.
"Codex-Spark is just a taste of what's possible on Cerebras hardware. Our broader goal is to accelerate a wide spectrum of AI workloads across both real-time and asynchronous use cases," said Sachin Katti, Head of Industrial Compute at OpenAI. That framing — real-time and asynchronous as complementary modes rather than competing ones — is exactly how the Spark/Codex pairing is being positioned.
If the 2025 coding revolution was about AI that could work autonomously for hours, the next chapter may be about AI that can also think with you in the moment — at a speed that finally feels like a genuine conversation.