Claude Code vs. OpenAI Codex

Claude Code vs. OpenAI Codex

Sep 11, 2025

Sep 11, 2025

8 mins

8 mins

Get started with Rube

Power your AI Assistant with 500+ tools

Get started with Rube

Power your AI Assistant with 500+ tools

Get started with Rube

Power your AI Assistant with 500+ tools

For the past few days, there has been a lot of hype around OpenAI's Codex. And at the same time, Claude Code has been evolving day by day, to a perfect AI Agent with a list of features like subagents, slash commands, MCP support, and so much more. While I still prefer Claude Code, I thought it would be interesting to see how both of them perform on the same task. People say Codex + GPT-5 provides code closer to what a human would write, so let's test them out.

Before we begin, Codex has introduced their support for stdio based MCPs. But still lacks the direct support for HTTP endpoints for MCPs. So to make sure our MCPs work, I've written a simple proxy layer over the stdio support so that Codex can use MCPs like Figma, Jira, GitHub, and more. You can find the code here: rube-mcp-adapter-auth.js

So I ran a real build using Figma MCP for UI cloning and a separate coding challenge. And as always both agents got identical prompts, same setup.

All the code from this comparison can be found here: github.com/rohittcodes/claude-vs-codex.

TL;DR

Don't have time? Here's what happened:

  • Figma cloning: Claude Code captured the design better but missed the yellow theme and a few details; Codex created its own version but was faster and cheaper

  • Job scheduler: Claude Code provided more reasoning steps and structured code; Codex was concise and faster

  • Overall: Claude Code is better for complex, detailed tasks with multiple steps. Codex is more efficient for straightforward code generation, with its own way of writing code.

  • UX/DX: Codex felt simpler to set up and use (not the HTTP-based MCPs); Claude’s developer experience felt deeper once you get used to it.

  • Cost: Claude Code used more tokens overall (Figma: 6,232,242; Scheduler: 234,772) vs Codex (Scheduler: 72,579; Figma: 1,499,455)

Introduction

Claude Code comes with native MCP support and extensive context windows. Codex recently added stdio-based MCP support (they still don't have direct support for HTTP endpoints for MCPs), while Claude Code supports MCPs out of the box. Btw, If you don't know what MCPs are, you can read about them here.

Instead of benchmarks, I wanted a practical comparison: build something devs can recognize. So, the tasks I picked were:

  • Figma UI cloned into a working frontend

  • A lightweight job scheduler with timezone handling

All within one day, with me just prompting.

How I tested them

I ran both agents through identical challenges:

  • Tools: Rube MCP + Figma

  • Languages: TypeScript

  • Measure: Token usage, time, code quality, dev experience

  • Both agents got the same prompts to keep it fair.

Rube MCP - Universal MCP Server

Rube MCP (by Composio) is the universal connection layer for MCP toolkits like Figma, Jira, GitHub, and more. Explore toolkits: docs.composio.dev/toolkits/introduction.

How to connect:

Visit the Rube page:https://rube.app

  • Click the installation button and select Claude Code

  • Copy the installation command and run it in your terminal (make sure Claude Code is already installed)

  • And done! You can just run Claude and ask the Rube MCP to do things for you. Run the /mcp command to make sure you are connected to the MCP server. If not, click on the server and authenticate yourself with Rube using the generated link.

For Codex, we’ll reuse the same auth token via the proxy layer, setup the rube-mcp-adapter-auth.js file from the repo. See Codex config docs here if you want more control over Codex setup. For now, your config.toml should contain:

[mcp_servers.rube]
command = "node"
args = ["your-path-to/rube-mcp-adapter-auth.js"]

Coding Comparison

Round 1: Figma design cloning

I picked a complex landing page from Figma Community and asked both agents to recreate it using Next.js and TypeScript. You can find the Figma design here.

Prompt:

Recreate the Figma landing page at [FIGMA_URL] in Next.js + TypeScript using TailwindCSS v4 only (no config file).
Follow a modular structure (components/layout/*, components/ui/*

I wasn’t building the full developer platform here, just cloning a large landing page to see how close each agent could get.

Claude Code results

Claude Code (Sonnet 4) delivered a working Next.js app but missed the yellow theme entirely. It captured the design structure to some extent and even exported images from the Figma design, but the visual accuracy was disappointing. The layout was there but colors, spacing, and typography were noticeably different from the original.

  • Tokens: used a lot more than Codex.. 6,232,242 tokens to be exact.

  • Time: Longer due to more iterations

  • Design fidelity: Partial - missed key theme elements

Codex results

Codex (GPT-5 Medium) created its own version of the landing page. It didn't replicate the theme, layout, or components from the original design. Instead, it built a decent-looking landing page from scratch with no image exports. The result was functional but completely different from the Figma design.

  • Tokens: fewer than Claude Code (i.e., 1,499,455 tokens)

  • Time: ~10 minutes

  • Design fidelity: None - created original design

Claude Code captured more of the original design but missed critical elements. Codex was faster and cheaper but ignored the design brief entirely.

Round 2: Job scheduler challenge

For the second task, It took a lot of time to decide upon this, it maybe not the best, but this is what I have for now.. PS: Suggest me some ideas for the new blogs.

I threw a complex TypeScript challenge at both agents: build a timezone-aware cron scheduler with persistence and catch-up execution. This tests system design, timezone handling, and production-ready code structure.

Prompt:


You can run both projects by cloning the repo here

Claude Code results

Claude Code delivered a comprehensive solution with extensive documentation and reasoning steps. It provided detailed explanations, great comments for typical part of the codes, and built-in test cases. The implementation was thorough with proper error handling, graceful shutdown, and production-ready structure.

  • Tokens: 234,772. Higher token usage due to detailed explanations

  • Time: Longer due to a comprehensive approach

  • Code quality: Production-ready with extensive documentation

Codex results

Codex was more concise and direct. It built a modular, timezone-aware cron scheduler with JSON persistence and catch-up functionality. The solution was clean and functional but with less verbose explanations. It focused on getting the job done efficiently.

  • Tokens: 72,579. Lower token usage, but more concise

  • Time: ~15 minutes

  • Code quality: Clean and functional

Both delivered working solutions. Claude Code provided more educational value and comprehensive documentation, while Codex was more efficient and direct.

What it cost (tokens + time)

Numbers vary by task complexity, but relative behaviour was consistent:

  • Figma task: Claude Code used significantly more tokens due to detailed reasoning and image exports; Codex was more efficient

  • Scheduler task: Claude Code provided comprehensive documentation but higher token usage; Codex was concise and faster

  • Overall: Claude Code (Sonnet 4) ~2-3× Codex (GPT-5 Medium) on token usage

Exact usage so far, Figma: Claude Code 6,232,242; Codex 1,499,455. Scheduler: Claude Code 234,772; Codex 72,579.

Conclusion

Both can build apps with MCPs in a single day, but they approach tasks differently:

Claude Code strengths

  • Better design fidelity with Figma (when it follows instructions)

  • More comprehensive documentation and reasoning

  • Production-ready code structure

  • Educational value with detailed explanations

Codex strengths

  • Faster raw generation

  • More cost-effective token usage

  • Direct, concise solutions

  • Good for "get something running" quickly

As for my take, use Codex if you want a prototype fast and cheap, or when design fidelity isn't critical. Only use Claude Code if you care about maintainability, documentation, and production readiness. And also for design-heavy tasks, Claude Code is better, but it can miss key elements (like the yellow theme), or maybe it was because of the recent performance issues with ClaudeAI.

For the past few days, there has been a lot of hype around OpenAI's Codex. And at the same time, Claude Code has been evolving day by day, to a perfect AI Agent with a list of features like subagents, slash commands, MCP support, and so much more. While I still prefer Claude Code, I thought it would be interesting to see how both of them perform on the same task. People say Codex + GPT-5 provides code closer to what a human would write, so let's test them out.

Before we begin, Codex has introduced their support for stdio based MCPs. But still lacks the direct support for HTTP endpoints for MCPs. So to make sure our MCPs work, I've written a simple proxy layer over the stdio support so that Codex can use MCPs like Figma, Jira, GitHub, and more. You can find the code here: rube-mcp-adapter-auth.js

So I ran a real build using Figma MCP for UI cloning and a separate coding challenge. And as always both agents got identical prompts, same setup.

All the code from this comparison can be found here: github.com/rohittcodes/claude-vs-codex.

TL;DR

Don't have time? Here's what happened:

  • Figma cloning: Claude Code captured the design better but missed the yellow theme and a few details; Codex created its own version but was faster and cheaper

  • Job scheduler: Claude Code provided more reasoning steps and structured code; Codex was concise and faster

  • Overall: Claude Code is better for complex, detailed tasks with multiple steps. Codex is more efficient for straightforward code generation, with its own way of writing code.

  • UX/DX: Codex felt simpler to set up and use (not the HTTP-based MCPs); Claude’s developer experience felt deeper once you get used to it.

  • Cost: Claude Code used more tokens overall (Figma: 6,232,242; Scheduler: 234,772) vs Codex (Scheduler: 72,579; Figma: 1,499,455)

Introduction

Claude Code comes with native MCP support and extensive context windows. Codex recently added stdio-based MCP support (they still don't have direct support for HTTP endpoints for MCPs), while Claude Code supports MCPs out of the box. Btw, If you don't know what MCPs are, you can read about them here.

Instead of benchmarks, I wanted a practical comparison: build something devs can recognize. So, the tasks I picked were:

  • Figma UI cloned into a working frontend

  • A lightweight job scheduler with timezone handling

All within one day, with me just prompting.

How I tested them

I ran both agents through identical challenges:

  • Tools: Rube MCP + Figma

  • Languages: TypeScript

  • Measure: Token usage, time, code quality, dev experience

  • Both agents got the same prompts to keep it fair.

Rube MCP - Universal MCP Server

Rube MCP (by Composio) is the universal connection layer for MCP toolkits like Figma, Jira, GitHub, and more. Explore toolkits: docs.composio.dev/toolkits/introduction.

How to connect:

Visit the Rube page:https://rube.app

  • Click the installation button and select Claude Code

  • Copy the installation command and run it in your terminal (make sure Claude Code is already installed)

  • And done! You can just run Claude and ask the Rube MCP to do things for you. Run the /mcp command to make sure you are connected to the MCP server. If not, click on the server and authenticate yourself with Rube using the generated link.

For Codex, we’ll reuse the same auth token via the proxy layer, setup the rube-mcp-adapter-auth.js file from the repo. See Codex config docs here if you want more control over Codex setup. For now, your config.toml should contain:

[mcp_servers.rube]
command = "node"
args = ["your-path-to/rube-mcp-adapter-auth.js"]

Coding Comparison

Round 1: Figma design cloning

I picked a complex landing page from Figma Community and asked both agents to recreate it using Next.js and TypeScript. You can find the Figma design here.

Prompt:

Recreate the Figma landing page at [FIGMA_URL] in Next.js + TypeScript using TailwindCSS v4 only (no config file).
Follow a modular structure (components/layout/*, components/ui/*

I wasn’t building the full developer platform here, just cloning a large landing page to see how close each agent could get.

Claude Code results

Claude Code (Sonnet 4) delivered a working Next.js app but missed the yellow theme entirely. It captured the design structure to some extent and even exported images from the Figma design, but the visual accuracy was disappointing. The layout was there but colors, spacing, and typography were noticeably different from the original.

  • Tokens: used a lot more than Codex.. 6,232,242 tokens to be exact.

  • Time: Longer due to more iterations

  • Design fidelity: Partial - missed key theme elements

Codex results

Codex (GPT-5 Medium) created its own version of the landing page. It didn't replicate the theme, layout, or components from the original design. Instead, it built a decent-looking landing page from scratch with no image exports. The result was functional but completely different from the Figma design.

  • Tokens: fewer than Claude Code (i.e., 1,499,455 tokens)

  • Time: ~10 minutes

  • Design fidelity: None - created original design

Claude Code captured more of the original design but missed critical elements. Codex was faster and cheaper but ignored the design brief entirely.

Round 2: Job scheduler challenge

For the second task, It took a lot of time to decide upon this, it maybe not the best, but this is what I have for now.. PS: Suggest me some ideas for the new blogs.

I threw a complex TypeScript challenge at both agents: build a timezone-aware cron scheduler with persistence and catch-up execution. This tests system design, timezone handling, and production-ready code structure.

Prompt:


You can run both projects by cloning the repo here

Claude Code results

Claude Code delivered a comprehensive solution with extensive documentation and reasoning steps. It provided detailed explanations, great comments for typical part of the codes, and built-in test cases. The implementation was thorough with proper error handling, graceful shutdown, and production-ready structure.

  • Tokens: 234,772. Higher token usage due to detailed explanations

  • Time: Longer due to a comprehensive approach

  • Code quality: Production-ready with extensive documentation

Codex results

Codex was more concise and direct. It built a modular, timezone-aware cron scheduler with JSON persistence and catch-up functionality. The solution was clean and functional but with less verbose explanations. It focused on getting the job done efficiently.

  • Tokens: 72,579. Lower token usage, but more concise

  • Time: ~15 minutes

  • Code quality: Clean and functional

Both delivered working solutions. Claude Code provided more educational value and comprehensive documentation, while Codex was more efficient and direct.

What it cost (tokens + time)

Numbers vary by task complexity, but relative behaviour was consistent:

  • Figma task: Claude Code used significantly more tokens due to detailed reasoning and image exports; Codex was more efficient

  • Scheduler task: Claude Code provided comprehensive documentation but higher token usage; Codex was concise and faster

  • Overall: Claude Code (Sonnet 4) ~2-3× Codex (GPT-5 Medium) on token usage

Exact usage so far, Figma: Claude Code 6,232,242; Codex 1,499,455. Scheduler: Claude Code 234,772; Codex 72,579.

Conclusion

Both can build apps with MCPs in a single day, but they approach tasks differently:

Claude Code strengths

  • Better design fidelity with Figma (when it follows instructions)

  • More comprehensive documentation and reasoning

  • Production-ready code structure

  • Educational value with detailed explanations

Codex strengths

  • Faster raw generation

  • More cost-effective token usage

  • Direct, concise solutions

  • Good for "get something running" quickly

As for my take, use Codex if you want a prototype fast and cheap, or when design fidelity isn't critical. Only use Claude Code if you care about maintainability, documentation, and production readiness. And also for design-heavy tasks, Claude Code is better, but it can miss key elements (like the yellow theme), or maybe it was because of the recent performance issues with ClaudeAI.

See how Claude powers real workflows with UI tools like Figma.

See how Claude powers real workflows with UI tools like Figma.

See how Claude powers real workflows with UI tools like Figma.