Gemini 2.0 Flash thinking vs. OpenAI o3-mini vs. deep seek r1

•

Mar 10, 2025

Big AI models are powerful but expensive. Smaller Chain-of-Thought (CoT) models like Gemini 2.0 Flash Thinking, OpenAI’s O3-Mini, and DeepSeek R1 offer a cheaper way to handle reasoning tasks. The real question is whether they are just as good.

Each model has a different pricing style. Gemini 2.0 Flash Thinking keeps input costs low at $0.10 per million tokens. O3-Mini charges $1.10 per million tokens but focuses on structured and efficient responses. DeepSeek R1 changes pricing based on time, with standard rates at $0.55 per million tokens and off-peak discounts as low as $0.135 per million tokens.

A lower price sounds great, but performance matters too. A model that saves money is not useful if it struggles with accuracy or speed. These models were tested in reasoning, math, coding, and creative writing to see how well they performed for their price.

Table of Content

Brieif on Gemini 2.0 Flash Thinking
Complex Reasoning
Mathematics
Coding
Creative Writing
Final Verdict

TL;DR

No time for a full breakdown? Here’s a quick look at how Gemini 2.0 Flash Thinking, O3-Mini, and DeepSeek R1 performed under pressure.

Reasoning: Gemini and O3-Mini stayed close. Gemini explains every step like a patient teacher. O3-Mini drops the answer and moves on. DeepSeek R1 took its time, double-checking every detail before answering.
Math: Gemini is fast but not always perfect. O3-Mini is steady and reliable. DeepSeek R1 spent so long analyzing, it felt like it was solving an ancient mystery.
Coding: O3-Mini responded instantly and delivered optimized code. Gemini took a second but got everything right. DeepSeek R1 overanalyzed before finally producing a solution.
Creative Writing: DeepSeek R1 created an action-packed sci-fi epic. Gemini built a slow-burn mystery with vivid details. O3-Mini crafted poetic, introspective storytelling.

O3-Mini is the best for speed. Gemini 2.0 balances logic and depth. DeepSeek R1 is the most precise but takes its time. Each model has its strengths.

Still here? Let’s dive into the details.

Brief on Gemini 2.0 Flash Thinking

Gemini 2.0 Flash Thinking is built for speed. It works through problems instead of making random guesses. It does not hesitate. It analyzes, reasons, and delivers with precision.

The 1M token context window is no small feature. It can process entire research papers, break down massive codebases, and handle deep analysis with ease. Large? Yes. Slow? Not at all. This model moves. It does not just give answers. It figures things out like an expert solving a complex case.

This guy summed it up best: Cheaper, more brilliant, and ready to take over.

Developers will find plenty to like. It handles tools efficiently, adapts to different inputs, and offers an API that keeps costs reasonable. Gemini 2.0 Flash Thinking is not here to compete. It is here to lead.

Complex Reasoning

All AI models claim to be good, but are they really?

I put them through four tough challenges to test their reasoning, speed, and ability to avoid simple mistakes. Let’s see how they did.

#1. Basic Logical Progression

This one was simple. No tricks or complicated wording, just a straightforward reasoning check.

Prompt: If today is Wednesday, what day will it be in three days?

If an AI messes this up, I’d start questioning everything.

Gemini 2.0 flash thinking:

Took about two to three seconds. Still quick but not as instant as O3-Mini.

O3-Mini:

O3-Mini wasted no time. One second.

DeepSeek R1:

DeepSeek R1 took a full 23 seconds. Not because it struggled but because it double, triple, and quadruple-checked before answering.

So, One model answered instantly. One took an extra second. One acted like it was preparing a research paper before committing.

That was just the first test. Time to step it up. Let’s see how they handle something more complex.

#2. Tracking Family Connections

This test checked how well the models could follow relationships in a family tree.

Prompt: If A is the brother of B, B is the son of C, C is the daughter of D, and D is the wife of E, how is E related to B?

Gemini 2.0 Flash Thinking:

It answered correctly with confidence. Almost the same speed as O3-Mini.

O3-Mini:

Took a few seconds. No hesitation, just the correct answer.

DeepSeek R1:

Gemini 2.0 Flash Thinking and O3-Mini were the best picks for quick answers.

#3. Solving Ciphers and Recognizing Patterns

This test checked how well the models could identify and apply a simple letter-based code.

Prompt: In a certain code, “PAPER” is written as “QBQFS”. How is “PENCIL” written in the same code?

Gemini 2.0 Flash Thinking:

O3-Mini:

DeepSeek R1:

At this point, it’s evident that O3-Mini and Gemini 2.0 Flash Thinking were the top performers for reasoning tasks involving pattern recognition and coded language.

#4. Handling Trick Questions with Logic

It’s time to see if these models fall for the trap or think it through.

Prompt: A robot has 8 arms. There are 5 objects on the table: a knife, a fork, a spoon, a teddy bear, and a doll. The robot picks up each object with an arm. He then shakes hands with himself. How many arms does he have free?

Gemini 2.0 Flash thinking:

O3-Mini:

DeepSeek R1:

Summary on Reasoning

O3-Mini and Gemini 2.0 Flash Thinking were evenly matched. Both answered correctly and kept up with every challenge.

The difference? O3-Mini fired out answers instantly. Gemini 2.0 took an extra moment to explain its reasoning. DeepSeek R1 was slow but never wrong. It checked, double-checked, and probably ran a full internal review before committing to an answer.

For fast and accurate reasoning, O3-Mini and Gemini 2.0 Flash Thinking are the best choices. If absolute precision matters more than speed, DeepSeek R1 is the safer bet.

Mathematics

Numbers tell the truth. Some AI models do not. Math is simple. Either the answer is correct or it is not. No extra words. Just problem-solving.

Let’s see which delivers the correct answers and which gets it wrong.

#1. Quick Calculation Challenge

Nothing fancy, just a straight multiplication problem:

Prompt: A school ordered 45 books for its library. Each book costs $12. What’s the total cost?

Gemini 2.0 Flash thinking:

O3-Mini:

DeepSeek R1:

The differences were clear. O3-Mini was the fastest. Gemini 2.0 took a moment but got the right answer. DeepSeek R1 treated the problem like a courtroom case.

#2. The Great Car vs. Truck Showdown

A classic speed-distance-time problem. Nothing too tricky unless you overthink it.

Prompt: The distance between the two towns is 380 km. A car and a truck start driving toward each other and meet in 4 hours. The car is 5 km/h faster than the truck. What are their speeds?

Gemini 2.0 Flash thinking:

O3-Mini:

DeepSeek R1:

Same answer, different styles. O3-Mini is fast and direct. Gemini 2.0 takes a moment to explain. DeepSeek R1, as always, overthinks every step before finally answering.

#3. The Flipping Card Puzzle

This logic challenge required planning not just fast calculations.

Prompt: Six cards are placed on a table, each with ‘1’ on one side and ‘2’ on the other. All six start with ‘1’ facing up. In one try, exactly four cards must be flipped. What’s the least number of tries needed to get all six showing ‘2’?

Gemini 2.0 Flash thinking:

O3-Mini:

DeepSeek R1:

This time, rushing didn’t pay off. Gemini 2.0 delivered both speed and accuracy, while DeepSeek played it safe and got there late. O3-Mini moved fast but missed the mark.

#4. The Parabola Challenge

This test mixed geometry and algebra. A tricky problem that needed clear thinking and precise calculations.

Prompt: Let P (4, 4√3) be a point on the parabola y² = 4ax, and let PQ be a focal chord. If M and N are the feet of the perpendiculars from P and Q to the directrix, what is the area of quadrilateral PQMN?

Gemini 2.0 Flash thinking:

O3-Mini:

DeepSeek R1:

Summary for Math

Gemini 2.0 Flash Thinking is fast and mostly accurate. O3-Mini stays close behind and makes fewer mistakes. DeepSeek R1 takes forever but never gets it wrong.

Coding

Leetcode-level coding problems separate true problem-solvers from those who just throw random syntax on the screen. Speed matters. So does the time it takes to write working code.

For this test, I picked a difficult-level problem from Leetcode called ‘Maximum Score Words Formed by Letters’ and watched how these models handled both code generation and execution speed.

O3-Mini:

got straight to work. No delay. It generated the solution instantly and ran it in 2ms. Clean and efficient.

class Solution(object):
    def maxScoreWords(self, words, letters, score):
        from collections import Counter
        
        # Count available letters.
        letterCount = Counter(letters)
        
        # Precompute frequency count and score for each word.
        wordInfo = []
        for word in words:
            count = Counter(word)
            wordScore = sum(score[ord(c) - ord('a')] * count[c] for c in count)
            wordInfo.append((count, wordScore))
        
        self.maxScore = 0
        
        def backtrack(i, currScore, available):
            # Base case: processed all words.
            if i == len(words):
                self.maxScore = max(self.maxScore, currScore)
                return
            
            # Option 1: Skip current word.
            backtrack(i + 1, currScore, available)
            
            # Option 2: Use current word if possible.
            wordCount, wordScore = wordInfo[i]
            canUse = True
            for ch in wordCount:
                if available[ch] < wordCount[ch]:
                    canUse = False
                    break
            if canUse:
                # Update the available letters.
                newAvailable = available.copy()
                for ch in wordCount:
                    newAvailable[ch] -= wordCount[ch]
                backtrack(i + 1, currScore + wordScore, newAvailable)
        
        # Start the recursion from the first word.
        backtrack(0, 0, letterCount)
        return self.maxScore

Gemini 2.0 Flash thinking:

Took a moment to think but delivered the right answer in 3ms. Not as fast as O3-Mini but still solid.

class Solution(object):
    def maxScoreWords(self, words, letters, score):
        letter_counts = {}
        for letter in letters:
            letter_counts[letter] = letter_counts.get(letter, 0) + 1

        word_counts_list = []
        for word in words:
            word_counts = {}
            word_score = 0
            for char in word:
                word_counts[char] = word_counts.get(char, 0) + 1
                word_score += score[ord(char) - ord('a')]
            word_counts_list.append((word_counts, word_score))

        max_score_holder = [0] # Use a list to hold max_score

        def solve(index, current_letter_counts, current_score):
            # nonlocal max_score  <- Remove nonlocal as it's Python 3 syntax and not needed with list
            if index == len(words):
                max_score_holder[0] = max(max_score_holder[0], current_score) # Update using list
                return

            # Option 1: Don't include the current word
            solve(index + 1, current_letter_counts, current_score)

            # Option 2: Include the current word if possible
            word_counts, word_score = word_counts_list[index]
            possible = True
            next_letter_counts = current_letter_counts.copy()

            for char, count in word_counts.items():
                if next_letter_counts.get(char, 0) < count:
                    possible = False
                    break
                next_letter_counts[char] -= count

            if possible:
                solve(index + 1, next_letter_counts, current_score + word_score)

        solve(0, letter_counts, 0)
        return max_score_holder[0] # Return value from the lis

DeepSeek R1:

Took its time. But got it done.

class Solution(object):
    def maxScoreWords(self, words, letters, score):
        # Precompute the available letters count
        letters_count = [0] * 26
        for c in letters:
            letters_count[ord(c) - ord('a')] += 1
        
        # Precompute each word's letter count and its score
        words_cnt = []
        words_scr = []
        for word in words:
            cnt = [0] * 26
            s = 0
            for c in word:
                idx = ord(c) - ord('a')
                cnt[idx] += 1
                s += score[idx]
            words_cnt.append(cnt)
            words_scr.append(s)
        
        max_score = 0
        n = len(words)
        
        def backtrack(index, current_counts, current_score):
            nonlocal max_score
            if index == n:
                if current_score > max_score:
                    max_score = current_score
                return
            # Option 1: Skip the current word
            # Pass a copy of current_counts to avoid mutation
            backtrack(index + 1, current_counts.copy(), current_score)
            
            # Option 2: Take the current word if possible
            word_c = words_cnt[index]
            possible = True
            for i in range(26):
                if current_counts[i] < word_c[i]:
                    possible = False
                    break
            if possible:
                # Create a new counts array by subtracting the word's letters
                new_counts = current_counts.copy()
                for i in range(26):
                    new_counts[i] -= word_c[i]
                new_score = current_score + words_scr[index]
                backtrack(index + 1, new_counts, new_score)
        
        # Start the backtracking with a copy of the initial letters count
        initial_counts = letters_count.copy()
        backtrack(0, initial_counts, 0)
        return max_score

Summary for Coding

Speed was the key in this test. O3-Mini came out on top, generating and executing code the fastest. Gemini 2.0 Flash Thinking stayed close behind with a solid performance. DeepSeek R1 took so long, it seemed to forget it was supposed to be coding.

Creative Writing

Math and code test logic. Storytelling is a different challenge. Can these models create something engaging? Do they build suspense and keep you hooked? Or do they produce something that feels flat and artificial?

To find out, I gave them a sci-fi mystery.

Prompt: You wake up in a world where everyone speaks a language you’ve never heard before, yet they all seem to know you. A mysterious letter in your pocket, written in your own handwriting, warns you not to trust anyone. What will you do next?

Gemini 2.0 Flash thinking

This model built a slow-burn mystery with rich details. It felt like a novel opening.

Response:

O3-Mini

It created a poetic, introspective story. Moody and surreal with plenty left unsaid.

Response:

DeepSeek R1

DeepSeek R1 jumped straight into action. Fast. Cinematic. Felt like a movie.

Response:

Summary of Creative Writing

DeepSeek R1 brought the most action. Gemini 2.0 Flash Thinking felt the most immersive. O3-Mini was the most artistic. They have different styles, but they are all capable storytellers.

Final Verdict

After testing Gemini 2.0 Flash Thinking, O3-Mini, and DeepSeek R1 across reasoning, math, coding, and creative writing, here’s how they compare:

Reasoning: Gemini 2.0 explains like a tutor. O3-Mini gives the answer and moves on. DeepSeek R1 gets everything right but takes its time overthinking.
Math: Gemini 2.0 is fast but not always perfect. O3-Mini stays steady with fewer mistakes. DeepSeek R1 double-checks like it's taking an exam.
Coding: O3-Mini delivers instantly. Gemini 2.0 takes a second but gets it right. DeepSeek R1 moves like it's writing a research paper.
Creative Writing: DeepSeek R1 creates a blockbuster-style story. Gemini 2.0 builds a slow-burn mystery. O3-Mini crafts poetic and introspective storytelling.

Each model excels in different areas. O3-Mini is the best for speed. Gemini 2.0 balances logic and depth. DeepSeek R1 is the most precise but takes its time. Whatever you need, one of them fits the job.

Table of Content

Brieif on Gemini 2.0 Flash Thinking
Complex Reasoning
Mathematics
Coding
Creative Writing
Final Verdict

TL;DR

No time for a full breakdown? Here’s a quick look at how Gemini 2.0 Flash Thinking, O3-Mini, and DeepSeek R1 performed under pressure.

Reasoning: Gemini and O3-Mini stayed close. Gemini explains every step like a patient teacher. O3-Mini drops the answer and moves on. DeepSeek R1 took its time, double-checking every detail before answering.
Math: Gemini is fast but not always perfect. O3-Mini is steady and reliable. DeepSeek R1 spent so long analyzing, it felt like it was solving an ancient mystery.
Coding: O3-Mini responded instantly and delivered optimized code. Gemini took a second but got everything right. DeepSeek R1 overanalyzed before finally producing a solution.
Creative Writing: DeepSeek R1 created an action-packed sci-fi epic. Gemini built a slow-burn mystery with vivid details. O3-Mini crafted poetic, introspective storytelling.

O3-Mini is the best for speed. Gemini 2.0 balances logic and depth. DeepSeek R1 is the most precise but takes its time. Each model has its strengths.

Still here? Let’s dive into the details.

Brief on Gemini 2.0 Flash Thinking

Gemini 2.0 Flash Thinking is built for speed. It works through problems instead of making random guesses. It does not hesitate. It analyzes, reasons, and delivers with precision.

This guy summed it up best: Cheaper, more brilliant, and ready to take over.

Complex Reasoning

All AI models claim to be good, but are they really?

I put them through four tough challenges to test their reasoning, speed, and ability to avoid simple mistakes. Let’s see how they did.

#1. Basic Logical Progression

This one was simple. No tricks or complicated wording, just a straightforward reasoning check.

Prompt: If today is Wednesday, what day will it be in three days?

If an AI messes this up, I’d start questioning everything.

Gemini 2.0 flash thinking:

Took about two to three seconds. Still quick but not as instant as O3-Mini.

O3-Mini:

O3-Mini wasted no time. One second.

DeepSeek R1:

DeepSeek R1 took a full 23 seconds. Not because it struggled but because it double, triple, and quadruple-checked before answering.

So, One model answered instantly. One took an extra second. One acted like it was preparing a research paper before committing.

That was just the first test. Time to step it up. Let’s see how they handle something more complex.

#2. Tracking Family Connections

This test checked how well the models could follow relationships in a family tree.

Prompt: If A is the brother of B, B is the son of C, C is the daughter of D, and D is the wife of E, how is E related to B?

Gemini 2.0 Flash Thinking:

It answered correctly with confidence. Almost the same speed as O3-Mini.

O3-Mini:

Took a few seconds. No hesitation, just the correct answer.

DeepSeek R1:

Gemini 2.0 Flash Thinking and O3-Mini were the best picks for quick answers.

#3. Solving Ciphers and Recognizing Patterns

This test checked how well the models could identify and apply a simple letter-based code.

Prompt: In a certain code, “PAPER” is written as “QBQFS”. How is “PENCIL” written in the same code?

Gemini 2.0 Flash Thinking:

O3-Mini:

DeepSeek R1:

At this point, it’s evident that O3-Mini and Gemini 2.0 Flash Thinking were the top performers for reasoning tasks involving pattern recognition and coded language.

#4. Handling Trick Questions with Logic

It’s time to see if these models fall for the trap or think it through.

Gemini 2.0 Flash thinking:

O3-Mini:

DeepSeek R1:

Summary on Reasoning

O3-Mini and Gemini 2.0 Flash Thinking were evenly matched. Both answered correctly and kept up with every challenge.

For fast and accurate reasoning, O3-Mini and Gemini 2.0 Flash Thinking are the best choices. If absolute precision matters more than speed, DeepSeek R1 is the safer bet.

Mathematics

Numbers tell the truth. Some AI models do not. Math is simple. Either the answer is correct or it is not. No extra words. Just problem-solving.

Let’s see which delivers the correct answers and which gets it wrong.

#1. Quick Calculation Challenge

Nothing fancy, just a straight multiplication problem:

Prompt: A school ordered 45 books for its library. Each book costs $12. What’s the total cost?

Gemini 2.0 Flash thinking:

O3-Mini:

DeepSeek R1:

The differences were clear. O3-Mini was the fastest. Gemini 2.0 took a moment but got the right answer. DeepSeek R1 treated the problem like a courtroom case.

#2. The Great Car vs. Truck Showdown

A classic speed-distance-time problem. Nothing too tricky unless you overthink it.

Prompt: The distance between the two towns is 380 km. A car and a truck start driving toward each other and meet in 4 hours. The car is 5 km/h faster than the truck. What are their speeds?

Gemini 2.0 Flash thinking:

O3-Mini:

DeepSeek R1:

Same answer, different styles. O3-Mini is fast and direct. Gemini 2.0 takes a moment to explain. DeepSeek R1, as always, overthinks every step before finally answering.

#3. The Flipping Card Puzzle

This logic challenge required planning not just fast calculations.

Gemini 2.0 Flash thinking:

O3-Mini:

DeepSeek R1:

This time, rushing didn’t pay off. Gemini 2.0 delivered both speed and accuracy, while DeepSeek played it safe and got there late. O3-Mini moved fast but missed the mark.

#4. The Parabola Challenge

This test mixed geometry and algebra. A tricky problem that needed clear thinking and precise calculations.

Gemini 2.0 Flash thinking:

O3-Mini:

DeepSeek R1:

Summary for Math

Gemini 2.0 Flash Thinking is fast and mostly accurate. O3-Mini stays close behind and makes fewer mistakes. DeepSeek R1 takes forever but never gets it wrong.

Coding

Leetcode-level coding problems separate true problem-solvers from those who just throw random syntax on the screen. Speed matters. So does the time it takes to write working code.

For this test, I picked a difficult-level problem from Leetcode called ‘Maximum Score Words Formed by Letters’ and watched how these models handled both code generation and execution speed.

O3-Mini:

got straight to work. No delay. It generated the solution instantly and ran it in 2ms. Clean and efficient.

class Solution(object):
    def maxScoreWords(self, words, letters, score):
        from collections import Counter
        
        # Count available letters.
        letterCount = Counter(letters)
        
        # Precompute frequency count and score for each word.
        wordInfo = []
        for word in words:
            count = Counter(word)
            wordScore = sum(score[ord(c) - ord('a')] * count[c] for c in count)
            wordInfo.append((count, wordScore))
        
        self.maxScore = 0
        
        def backtrack(i, currScore, available):
            # Base case: processed all words.
            if i == len(words):
                self.maxScore = max(self.maxScore, currScore)
                return
            
            # Option 1: Skip current word.
            backtrack(i + 1, currScore, available)
            
            # Option 2: Use current word if possible.
            wordCount, wordScore = wordInfo[i]
            canUse = True
            for ch in wordCount:
                if available[ch] < wordCount[ch]:
                    canUse = False
                    break
            if canUse:
                # Update the available letters.
                newAvailable = available.copy()
                for ch in wordCount:
                    newAvailable[ch] -= wordCount[ch]
                backtrack(i + 1, currScore + wordScore, newAvailable)
        
        # Start the recursion from the first word.
        backtrack(0, 0, letterCount)
        return self.maxScore

Gemini 2.0 Flash thinking:

Took a moment to think but delivered the right answer in 3ms. Not as fast as O3-Mini but still solid.

class Solution(object):
    def maxScoreWords(self, words, letters, score):
        letter_counts = {}
        for letter in letters:
            letter_counts[letter] = letter_counts.get(letter, 0) + 1

        word_counts_list = []
        for word in words:
            word_counts = {}
            word_score = 0
            for char in word:
                word_counts[char] = word_counts.get(char, 0) + 1
                word_score += score[ord(char) - ord('a')]
            word_counts_list.append((word_counts, word_score))

        max_score_holder = [0] # Use a list to hold max_score

        def solve(index, current_letter_counts, current_score):
            # nonlocal max_score  <- Remove nonlocal as it's Python 3 syntax and not needed with list
            if index == len(words):
                max_score_holder[0] = max(max_score_holder[0], current_score) # Update using list
                return

            # Option 1: Don't include the current word
            solve(index + 1, current_letter_counts, current_score)

            # Option 2: Include the current word if possible
            word_counts, word_score = word_counts_list[index]
            possible = True
            next_letter_counts = current_letter_counts.copy()

            for char, count in word_counts.items():
                if next_letter_counts.get(char, 0) < count:
                    possible = False
                    break
                next_letter_counts[char] -= count

            if possible:
                solve(index + 1, next_letter_counts, current_score + word_score)

        solve(0, letter_counts, 0)
        return max_score_holder[0] # Return value from the lis

DeepSeek R1:

Took its time. But got it done.

class Solution(object):
    def maxScoreWords(self, words, letters, score):
        # Precompute the available letters count
        letters_count = [0] * 26
        for c in letters:
            letters_count[ord(c) - ord('a')] += 1
        
        # Precompute each word's letter count and its score
        words_cnt = []
        words_scr = []
        for word in words:
            cnt = [0] * 26
            s = 0
            for c in word:
                idx = ord(c) - ord('a')
                cnt[idx] += 1
                s += score[idx]
            words_cnt.append(cnt)
            words_scr.append(s)
        
        max_score = 0
        n = len(words)
        
        def backtrack(index, current_counts, current_score):
            nonlocal max_score
            if index == n:
                if current_score > max_score:
                    max_score = current_score
                return
            # Option 1: Skip the current word
            # Pass a copy of current_counts to avoid mutation
            backtrack(index + 1, current_counts.copy(), current_score)
            
            # Option 2: Take the current word if possible
            word_c = words_cnt[index]
            possible = True
            for i in range(26):
                if current_counts[i] < word_c[i]:
                    possible = False
                    break
            if possible:
                # Create a new counts array by subtracting the word's letters
                new_counts = current_counts.copy()
                for i in range(26):
                    new_counts[i] -= word_c[i]
                new_score = current_score + words_scr[index]
                backtrack(index + 1, new_counts, new_score)
        
        # Start the backtracking with a copy of the initial letters count
        initial_counts = letters_count.copy()
        backtrack(0, initial_counts, 0)
        return max_score

Summary for Coding

Creative Writing

To find out, I gave them a sci-fi mystery.

Gemini 2.0 Flash thinking

This model built a slow-burn mystery with rich details. It felt like a novel opening.

Response:

O3-Mini

It created a poetic, introspective story. Moody and surreal with plenty left unsaid.

Response:

DeepSeek R1

DeepSeek R1 jumped straight into action. Fast. Cinematic. Felt like a movie.

Response:

Summary of Creative Writing

DeepSeek R1 brought the most action. Gemini 2.0 Flash Thinking felt the most immersive. O3-Mini was the most artistic. They have different styles, but they are all capable storytellers.

Final Verdict

After testing Gemini 2.0 Flash Thinking, O3-Mini, and DeepSeek R1 across reasoning, math, coding, and creative writing, here’s how they compare:

Reasoning: Gemini 2.0 explains like a tutor. O3-Mini gives the answer and moves on. DeepSeek R1 gets everything right but takes its time overthinking.
Math: Gemini 2.0 is fast but not always perfect. O3-Mini stays steady with fewer mistakes. DeepSeek R1 double-checks like it's taking an exam.
Coding: O3-Mini delivers instantly. Gemini 2.0 takes a second but gets it right. DeepSeek R1 moves like it's writing a research paper.
Creative Writing: DeepSeek R1 creates a blockbuster-style story. Gemini 2.0 builds a slow-burn mystery. O3-Mini crafts poetic and introspective storytelling.