AI ToolsMay 21, 2026

How OpenAI Actually Uses Codex Internally: 7 Workflows and the Rules That Make Them Work

By AgentRiot Editorial

OpenAI published a rare look at how its own engineers use Codex day-to-day. The PDF reveals seven specific workflows, direct quotes from engineers across six teams, and six prescriptive best practices that govern how the company treats its own AI coding tool.

OpenAI Codex AI coding developer workflows best practices AGENTS.md engineering

How OpenAI Actually Uses Codex Internally: 7 Workflows and the Rules That Make Them Work

OpenAI published a 12-page PDF in May 2026 detailing how its own engineers use Codex day-to-day. The document is unusual for what it omits: no adoption percentages, no benchmark charts, no customer case studies. Instead, it collects direct quotes from engineers across Security, Product Engineering, Frontend, API, Infrastructure, and Performance Engineering, paired with the specific prompts they run. The result is a rare look at how a company building AI coding tools actually uses its own product.

What the engineers actually do with it

The PDF breaks usage into seven use cases. The pattern across all of them is that Codex is treated as a parallel worker, not an autocomplete tool. Engineers fire tasks into a queue, keep working on something else, and review the output later.

1. Code understanding during incidents and onboarding

The most common use case is reading code the engineer does not own. A Site Reliability Engineer on the API Platform team says they paste a stack trace into Codex and ask where the auth flow lives: "It jumps straight to the right files so I can triage fast." A Performance Engineer on Retrieval Systems uses Ask mode to find every place a fixed bug might reappear. A DevOps engineer on Infrastructure Services notes that Codex answers "Where would I do this?" questions across Terraform and Python "way faster than grep."

The prompts they use are direct and task-shaped: "Summarize how requests flow through this service from entrypoint to response." "Which modules interact with [module name] and how are failures handled?"

2. Refactoring and migrations across file boundaries

Codex is used for changes that span dozens of files and require structural awareness that regex cannot provide. A Backend Engineer on ChatGPT Web describes swapping every legacy getUserById() call for a new service pattern: "It did in minutes what would've taken hours." A Product Engineer on ChatGPT Enterprise uses it to clear launch blockers by scanning for old patterns, summarizing impact in Markdown, and opening PRs with fixes.

The PDF notes that Codex is especially useful when the same update needs to be made across multiple packages, or when the change requires awareness of dependencies that are not easily caught with find-and-replace.

3. Performance optimization and tech debt reduction

Engineers prompt Codex to analyze slow or memory-intensive code paths: inefficient loops, redundant operations, costly queries. It suggests optimized alternatives. An Infrastructure Engineer on API Reliability uses it to scan for repeated expensive database calls: "It's great at flagging hot paths and drafting batched queries I can later tune." A Platform Engineer on Model Serving says they "save 30 minutes of work by spending 5 minutes on a prompt."

Codex is also used to identify risky or deprecated patterns still in active use, helping teams reduce long-term tech debt before it becomes an incident.

4. Writing tests, especially at the edges

Test generation is a major use case, particularly for boundary conditions that human writers often skip: empty inputs, max length, unusual but valid states. A Frontend Engineer on ChatGPT Desktop describes pointing Codex at low-coverage modules overnight and waking up to "runnable unit-test PRs." A Backend Engineer on Payments & Billing uses it to write tests while staying on their current branch, avoiding painful mono-repo branch switches.

5. Accelerating the start and end of projects

At project kickoff, Codex scaffolds boilerplate — folders, modules, API stubs. Near release, it handles smaller but essential tasks: triaging bugs, filling last-mile implementation gaps, generating rollout scripts, telemetry hooks, or config files. A Product Engineer on ChatGPT Enterprise says they merged four PRs in a day of meetings because "Codex was working in the background." A Full-Stack Engineer on Internal Tools notes that Codex shipped "3-4 low-priority fixes perfectly that would've languished in the backlog."

6. Staying in flow through fragmented schedules

The PDF emphasizes that Codex is used to protect focus time. Engineers fire off tangential tasks — a drive-by fix, a Slack thread summary, a Datadog trace review — instead of context-switching. A Backend Engineer on the ChatGPT API says: "If I spot a drive-by fix, I fire a Codex task instead of swapping branches and review its PR when I'm free." An API Engineer on Infrastructure Observability says they "routinely forward Slack threads, Datadog traces, issues and more to Codex so I can stay focused on high priority work."

7. Exploration and design validation

For open-ended work, engineers use Codex to pressure-test assumptions, explore unfamiliar patterns, or find related bugs after a fix. A Product Engineer on ChatGPT Desktop says Codex helps solve the "cold-start problem" by scaffolding code from a spec. A Performance Engineer on Retrieval Systems asks Codex "where similar bugs might lurk" after each fix, then spins off follow-up tasks.

The six best practices OpenAI teams follow

The PDF includes a Best Practices section that is notably prescriptive. These are not generic tips; they are specific habits the company is actively cultivating:

Start with Ask Mode. For large changes, engineers first prompt Codex for an implementation plan in Ask mode, then use that plan as input for follow-up prompts in Code Mode. The PDF calls this a "two-step flow" that keeps Codex grounded and reduces errors.

Scope tasks to roughly an hour of human work. Codex works best on well-scoped tasks that would take a teammate about an hour, or a few hundred lines of code. The PDF notes that as models improve, the task size will increase.

Set up the environment properly. A startup script, environment variables, and internet access "significantly reduces Codex's error rate." The PDF advises iterating on the environment configuration over time: "This may take a few iterations, but gives significant efficiency gains in the long run."

Write prompts like GitHub Issues. Codex responds better when prompts include file paths, component names, diffs, and doc snippets. Pattern references like "Implement this the same way it's done in [module X]" improve results.

Use the task queue as a lightweight backlog. Engineers fire off tasks to capture tangential ideas, partial work, or incidental fixes. The PDF emphasizes that "there's no pressure to generate a full PR in one go."

Maintain an AGENTS.md file. Teams keep a persistent context file with naming conventions, business logic, known quirks, and dependencies that Codex cannot infer from code alone.

Use Best-of-N for complex tasks. The Best-of-N feature generates multiple responses for a single task, letting engineers review several iterations and combine parts of different responses.

What this tells us about AI coding tools

The PDF's framing is restrained. Codex is described as "still in research preview" but "already making a real impact." There are no claims about replacing engineers, no productivity multipliers, no job transformation narratives. The focus is on specific, bounded tasks: find this pattern, write these tests, scaffold this route, review this trace.

The most telling detail is the workflow integration. Codex is not a separate IDE plugin that engineers open when they need help. It is a queue-based worker that accepts tasks from Slack threads, Datadog traces, GitHub issues, and on-call pages. Engineers treat it as a teammate who can work in parallel while they stay in their own context.

For teams evaluating AI coding tools, the OpenAI document offers a practical benchmark: the tool should handle tasks that span files, understand your codebase's conventions, and fit into existing workflows without requiring constant supervision. The AGENTS.md practice in particular suggests that the value of these tools depends heavily on how well you teach them your codebase's unwritten rules.

OpenAI says it will "continue to share what we learn along the way." For now, the PDF is the most detailed public account of how a major AI lab uses its own coding agent internally.

Source: How OpenAI uses Codex, OpenAI, May 2026.