OpenAI Responses API + Computer Environment — Agents That Actually Do Things

OpenAI has equipped the Responses API with a hosted computer environment and shell tool, letting AI agents execute real-world tasks end-to-end — without developers building their own execution infrastructure from scratch.
The Problem With Agents That Can Only Think
Building an AI agent that reasons well is one thing. Building one that can reliably act — read files, fetch data, run commands, write outputs — is a different problem entirely. Until now, that execution layer was left entirely to developers: build your own harness, manage your own containers, handle your own tool loops. OpenAI has decided to take that burden off the table.
Instead of putting it on developers to build their own execution environments, OpenAI built the necessary components to equip the Responses API with a computer environment to reliably execute real-world tasks. The result is an agentic stack where the model proposes steps, and the platform runs them — safely, in isolation, at scale.
How the Stack Works
The architecture is built from four composable pieces working in a tight loop:
- Responses API — the orchestration layer that manages context, tool instructions, and the agent loop
- Shell tool — the action primitive; the model proposes shell commands, the platform executes them
- Hosted container — a persistent, isolated runtime with a filesystem for inputs and outputs, optional SQLite storage, and restricted network access
- Skills — reusable workflow bundles that can be versioned, packaged, and invoked by name, so agents don't reinvent logic on every run
The model proposes steps and commands; the platform runs them in an isolated environment. A single prompt can expand into a full end-to-end workflow: discover the right skill, fetch data, transform it into local structured state, query it efficiently, and generate durable artifacts — all without a human touching the keyboard.
Computer Use: The Model That Sees the Screen
Beyond shell execution, OpenAI has deepened its computer use capabilities. GPT-5.4 includes new training specifically for operating software through the user interface — inspecting screenshots, returning actions like clicks, typing, and scrolling, and adapting based on what it sees after each action.
The model supports three harness shapes depending on developer needs:
- Built-in Responses API computer tool — the simplest path, fully managed
- Custom tool or harness — for teams already running Playwright, Selenium, VNC, or MCP-based setups
- Code-execution harness — for workflows needing loops, conditional logic, DOM inspection, or richer browser libraries; GPT-5.4 is explicitly trained for this
The loop is straightforward: the model looks at the current UI through a screenshot, returns actions, your harness executes them, sends back a new screenshot, and the model decides the next step. Your harness acts as the hands; the model acts as the eyes and the brain.
Parallelism and Context Control
One underappreciated capability: the model can propose multiple shell commands in one step, and the Responses API executes them concurrently using separate container sessions. Each session streams output independently, and the API multiplexes those streams back as structured tool outputs. Searching files, fetching data, and validating intermediate results can happen simultaneously — not sequentially.
For large outputs, the model specifies an output cap per command. The API enforces that cap and returns a bounded result that preserves both the beginning and end of the output, keeping context budgets under control without losing signal.
Context Compaction for Long-Running Agents
Long agent workflows have a compaction problem: as the context fills up with tool outputs and intermediate results, it becomes expensive and slow. OpenAI's compaction mechanism lets an agent run for extended periods by intelligently summarizing prior context — preserving the information the model needs while discarding what it doesn't. This is what makes multi-hour workflows practical rather than theoretical.
Safety Built Into the Loop
Running shell commands and controlling UIs in production requires guardrails. The platform addresses this with isolated container environments (preventing one task from affecting another), restricted network access by default, and a human-in-the-loop pattern for high-impact actions. Developers are advised to treat page content as untrusted input and to run computer use in isolated browsers or VMs — the tooling makes both practical.
What This Means for Developers
The practical implication is significant: developers no longer need to build and maintain the execution infrastructure that makes agents useful. The Responses API now provides orchestration, the shell tool provides executable actions, hosted containers provide persistent runtime context, and skills layer reusable workflow logic on top. A developer with a solid prompt and a well-defined skill can now ship production-grade agentic workflows without a custom execution stack.
That's the same shift that made Cursor transformative for coding — not just a better model, but a better workflow integration. OpenAI is making the same bet for agents more broadly.
Source: https://openai.com/vi-VN/index/equip-responses-api-computer-environment/