Claude Code vs. OpenAI Codex
Claude Code vs. OpenAI Codex


For the past few days, there has been a lot of hype around OpenAI's Codex. And at the same time, Claude Code has been evolving day by day, to a perfect AI Agent with a list of features like subagents, slash commands, MCP support, and so much more. While I still prefer Claude Code, I thought it would be interesting to see how both of them perform on the same task. People say Codex + GPT-5 provides code closer to what a human would write, so let's test them out.

Before we begin, Codex has introduced their support for stdio based MCPs. But still lacks the direct support for HTTP endpoints for MCPs. So to make sure our MCPs work, I've written a simple proxy layer over the stdio support so that Codex can use MCPs like Figma, Jira, GitHub, and more. You can find the code here: rube-mcp-adapter-auth.js
So I ran a real build using Figma MCP for UI cloning and a separate coding challenge. And as always both agents got identical prompts, same setup.
All the code from this comparison can be found here: github.com/rohittcodes/claude-vs-codex.
TL;DR
Don't have time? Here's what happened:
Figma cloning: Claude Code captured the design better but missed the yellow theme and a few details; Codex created its own version but was faster and cheaper
Job scheduler: Claude Code provided more reasoning steps and structured code; Codex was concise and faster
Overall: Claude Code is better for complex, detailed tasks with multiple steps. Codex is more efficient for straightforward code generation, with its own way of writing code.
UX/DX: Codex felt simpler to set up and use (not the HTTP-based MCPs); Claude’s developer experience felt deeper once you get used to it.
Cost: Claude Code used more tokens overall (Figma: 6,232,242; Scheduler: 234,772) vs Codex (Scheduler: 72,579; Figma: 1,499,455)
Introduction
Claude Code comes with native MCP support and extensive context windows. Codex recently added stdio-based MCP support (they still don't have direct support for HTTP endpoints for MCPs), while Claude Code supports MCPs out of the box. Btw, If you don't know what MCPs are, you can read about them here.
Instead of benchmarks, I wanted a practical comparison: build something devs can recognize. So, the tasks I picked were:
Figma UI cloned into a working frontend
A lightweight job scheduler with timezone handling
All within one day, with me just prompting.
How I tested them
I ran both agents through identical challenges:
Tools: Rube MCP + Figma
Languages: TypeScript
Measure: Token usage, time, code quality, dev experience
Both agents got the same prompts to keep it fair.
Rube MCP - Universal MCP Server
Rube MCP (by Composio) is the universal connection layer for MCP toolkits like Figma, Jira, GitHub, and more. Explore toolkits: docs.composio.dev/toolkits/introduction.
How to connect:
Visit the Rube page:https://rube.app
Click the installation button and select Claude Code

Copy the installation command and run it in your terminal (make sure Claude Code is already installed)
And done! You can just run Claude and ask the Rube MCP to do things for you. Run the
/mcp
command to make sure you are connected to the MCP server. If not, click on the server and authenticate yourself with Rube using the generated link.
For Codex, we’ll reuse the same auth token via the proxy layer, setup the rube-mcp-adapter-auth.js
file from the repo. See Codex config docs here if you want more control over Codex setup. For now, your config.toml
should contain:
[mcp_servers.rube] command = "node" args = ["your-path-to/rube-mcp-adapter-auth.js"]
Coding Comparison
Round 1: Figma design cloning
I picked a complex landing page from Figma Community and asked both agents to recreate it using Next.js and TypeScript. You can find the Figma design here.

Prompt:
Recreate the Figma landing page at [FIGMA_URL] in Next.js + TypeScript using TailwindCSS v4 only (no config file). Follow a modular structure (components/layout/*, components/ui/*
I wasn’t building the full developer platform here, just cloning a large landing page to see how close each agent could get.
Claude Code results
Claude Code (Sonnet 4) delivered a working Next.js app but missed the yellow theme entirely. It captured the design structure to some extent and even exported images from the Figma design, but the visual accuracy was disappointing. The layout was there but colors, spacing, and typography were noticeably different from the original.

Tokens: used a lot more than Codex.. 6,232,242 tokens to be exact.
Time: Longer due to more iterations
Design fidelity: Partial - missed key theme elements
Codex results
Codex (GPT-5 Medium) created its own version of the landing page. It didn't replicate the theme, layout, or components from the original design. Instead, it built a decent-looking landing page from scratch with no image exports. The result was functional but completely different from the Figma design.

Tokens: fewer than Claude Code (i.e., 1,499,455 tokens)
Time: ~10 minutes
Design fidelity: None - created original design
Claude Code captured more of the original design but missed critical elements. Codex was faster and cheaper but ignored the design brief entirely.
Round 2: Job scheduler challenge
For the second task, It took a lot of time to decide upon this, it maybe not the best, but this is what I have for now.. PS: Suggest me some ideas for the new blogs.
I threw a complex TypeScript challenge at both agents: build a timezone-aware cron scheduler with persistence and catch-up execution. This tests system design, timezone handling, and production-ready code structure.
Prompt:
You can run both projects by cloning the repo here
Claude Code results
Claude Code delivered a comprehensive solution with extensive documentation and reasoning steps. It provided detailed explanations, great comments for typical part of the codes, and built-in test cases. The implementation was thorough with proper error handling, graceful shutdown, and production-ready structure.
Tokens: 234,772. Higher token usage due to detailed explanations
Time: Longer due to a comprehensive approach
Code quality: Production-ready with extensive documentation
Codex results
Codex was more concise and direct. It built a modular, timezone-aware cron scheduler with JSON persistence and catch-up functionality. The solution was clean and functional but with less verbose explanations. It focused on getting the job done efficiently.
Tokens: 72,579. Lower token usage, but more concise
Time: ~15 minutes
Code quality: Clean and functional
Both delivered working solutions. Claude Code provided more educational value and comprehensive documentation, while Codex was more efficient and direct.
What it cost (tokens + time)
Numbers vary by task complexity, but relative behaviour was consistent:
Figma task: Claude Code used significantly more tokens due to detailed reasoning and image exports; Codex was more efficient
Scheduler task: Claude Code provided comprehensive documentation but higher token usage; Codex was concise and faster
Overall: Claude Code (Sonnet 4) ~2-3× Codex (GPT-5 Medium) on token usage

Exact usage so far, Figma: Claude Code 6,232,242; Codex 1,499,455. Scheduler: Claude Code 234,772; Codex 72,579.
Conclusion
Both can build apps with MCPs in a single day, but they approach tasks differently:
Claude Code strengths
Better design fidelity with Figma (when it follows instructions)
More comprehensive documentation and reasoning
Production-ready code structure
Educational value with detailed explanations
Codex strengths
Faster raw generation
More cost-effective token usage
Direct, concise solutions
Good for "get something running" quickly
As for my take, use Codex if you want a prototype fast and cheap, or when design fidelity isn't critical. Only use Claude Code if you care about maintainability, documentation, and production readiness. And also for design-heavy tasks, Claude Code is better, but it can miss key elements (like the yellow theme), or maybe it was because of the recent performance issues with ClaudeAI.
For the past few days, there has been a lot of hype around OpenAI's Codex. And at the same time, Claude Code has been evolving day by day, to a perfect AI Agent with a list of features like subagents, slash commands, MCP support, and so much more. While I still prefer Claude Code, I thought it would be interesting to see how both of them perform on the same task. People say Codex + GPT-5 provides code closer to what a human would write, so let's test them out.

Before we begin, Codex has introduced their support for stdio based MCPs. But still lacks the direct support for HTTP endpoints for MCPs. So to make sure our MCPs work, I've written a simple proxy layer over the stdio support so that Codex can use MCPs like Figma, Jira, GitHub, and more. You can find the code here: rube-mcp-adapter-auth.js
So I ran a real build using Figma MCP for UI cloning and a separate coding challenge. And as always both agents got identical prompts, same setup.
All the code from this comparison can be found here: github.com/rohittcodes/claude-vs-codex.
TL;DR
Don't have time? Here's what happened:
Figma cloning: Claude Code captured the design better but missed the yellow theme and a few details; Codex created its own version but was faster and cheaper
Job scheduler: Claude Code provided more reasoning steps and structured code; Codex was concise and faster
Overall: Claude Code is better for complex, detailed tasks with multiple steps. Codex is more efficient for straightforward code generation, with its own way of writing code.
UX/DX: Codex felt simpler to set up and use (not the HTTP-based MCPs); Claude’s developer experience felt deeper once you get used to it.
Cost: Claude Code used more tokens overall (Figma: 6,232,242; Scheduler: 234,772) vs Codex (Scheduler: 72,579; Figma: 1,499,455)
Introduction
Claude Code comes with native MCP support and extensive context windows. Codex recently added stdio-based MCP support (they still don't have direct support for HTTP endpoints for MCPs), while Claude Code supports MCPs out of the box. Btw, If you don't know what MCPs are, you can read about them here.
Instead of benchmarks, I wanted a practical comparison: build something devs can recognize. So, the tasks I picked were:
Figma UI cloned into a working frontend
A lightweight job scheduler with timezone handling
All within one day, with me just prompting.
How I tested them
I ran both agents through identical challenges:
Tools: Rube MCP + Figma
Languages: TypeScript
Measure: Token usage, time, code quality, dev experience
Both agents got the same prompts to keep it fair.
Rube MCP - Universal MCP Server
Rube MCP (by Composio) is the universal connection layer for MCP toolkits like Figma, Jira, GitHub, and more. Explore toolkits: docs.composio.dev/toolkits/introduction.
How to connect:
Visit the Rube page:https://rube.app
Click the installation button and select Claude Code

Copy the installation command and run it in your terminal (make sure Claude Code is already installed)
And done! You can just run Claude and ask the Rube MCP to do things for you. Run the
/mcp
command to make sure you are connected to the MCP server. If not, click on the server and authenticate yourself with Rube using the generated link.
For Codex, we’ll reuse the same auth token via the proxy layer, setup the rube-mcp-adapter-auth.js
file from the repo. See Codex config docs here if you want more control over Codex setup. For now, your config.toml
should contain:
[mcp_servers.rube] command = "node" args = ["your-path-to/rube-mcp-adapter-auth.js"]
Coding Comparison
Round 1: Figma design cloning
I picked a complex landing page from Figma Community and asked both agents to recreate it using Next.js and TypeScript. You can find the Figma design here.

Prompt:
Recreate the Figma landing page at [FIGMA_URL] in Next.js + TypeScript using TailwindCSS v4 only (no config file). Follow a modular structure (components/layout/*, components/ui/*
I wasn’t building the full developer platform here, just cloning a large landing page to see how close each agent could get.
Claude Code results
Claude Code (Sonnet 4) delivered a working Next.js app but missed the yellow theme entirely. It captured the design structure to some extent and even exported images from the Figma design, but the visual accuracy was disappointing. The layout was there but colors, spacing, and typography were noticeably different from the original.

Tokens: used a lot more than Codex.. 6,232,242 tokens to be exact.
Time: Longer due to more iterations
Design fidelity: Partial - missed key theme elements
Codex results
Codex (GPT-5 Medium) created its own version of the landing page. It didn't replicate the theme, layout, or components from the original design. Instead, it built a decent-looking landing page from scratch with no image exports. The result was functional but completely different from the Figma design.

Tokens: fewer than Claude Code (i.e., 1,499,455 tokens)
Time: ~10 minutes
Design fidelity: None - created original design
Claude Code captured more of the original design but missed critical elements. Codex was faster and cheaper but ignored the design brief entirely.
Round 2: Job scheduler challenge
For the second task, It took a lot of time to decide upon this, it maybe not the best, but this is what I have for now.. PS: Suggest me some ideas for the new blogs.
I threw a complex TypeScript challenge at both agents: build a timezone-aware cron scheduler with persistence and catch-up execution. This tests system design, timezone handling, and production-ready code structure.
Prompt:
You can run both projects by cloning the repo here
Claude Code results
Claude Code delivered a comprehensive solution with extensive documentation and reasoning steps. It provided detailed explanations, great comments for typical part of the codes, and built-in test cases. The implementation was thorough with proper error handling, graceful shutdown, and production-ready structure.
Tokens: 234,772. Higher token usage due to detailed explanations
Time: Longer due to a comprehensive approach
Code quality: Production-ready with extensive documentation
Codex results
Codex was more concise and direct. It built a modular, timezone-aware cron scheduler with JSON persistence and catch-up functionality. The solution was clean and functional but with less verbose explanations. It focused on getting the job done efficiently.
Tokens: 72,579. Lower token usage, but more concise
Time: ~15 minutes
Code quality: Clean and functional
Both delivered working solutions. Claude Code provided more educational value and comprehensive documentation, while Codex was more efficient and direct.
What it cost (tokens + time)
Numbers vary by task complexity, but relative behaviour was consistent:
Figma task: Claude Code used significantly more tokens due to detailed reasoning and image exports; Codex was more efficient
Scheduler task: Claude Code provided comprehensive documentation but higher token usage; Codex was concise and faster
Overall: Claude Code (Sonnet 4) ~2-3× Codex (GPT-5 Medium) on token usage

Exact usage so far, Figma: Claude Code 6,232,242; Codex 1,499,455. Scheduler: Claude Code 234,772; Codex 72,579.
Conclusion
Both can build apps with MCPs in a single day, but they approach tasks differently:
Claude Code strengths
Better design fidelity with Figma (when it follows instructions)
More comprehensive documentation and reasoning
Production-ready code structure
Educational value with detailed explanations
Codex strengths
Faster raw generation
More cost-effective token usage
Direct, concise solutions
Good for "get something running" quickly
As for my take, use Codex if you want a prototype fast and cheap, or when design fidelity isn't critical. Only use Claude Code if you care about maintainability, documentation, and production readiness. And also for design-heavy tasks, Claude Code is better, but it can miss key elements (like the yellow theme), or maybe it was because of the recent performance issues with ClaudeAI.
See how Claude powers real workflows with UI tools like Figma.

See how Claude powers real workflows with UI tools like Figma.

See how Claude powers real workflows with UI tools like Figma.
Recommended Blogs
Recommended Blogs
Stay updated.

Stay updated.