The latest version of ChatGPT, known as GPT-4o, has just been launched, and it promises to be a nice upgrade from its predecessor, ChatGPT-4 (GPT-4). While data and benchmarks suggest an improvement, we’re diving deeper to see if the new model truly lives up to the hype. So, we’re going to do what anyone would do in this predicament: create a battle…an AI war… welcome to GPT-4o vs. GPT-4.
Here’s a visual that compares this new model to all competing large language models. This is directly from OpenAI’s website. As you can see, OpenAI is claiming that GPT-4o is the best in four out of the five categories for text evaluation.
Here at Run The Prompts, we like to test things ourselves to make sure.
We will be testing GPT-4o across a bunch of scenarios and tasks, comparing it directly with GPT-4. Our goal is to determine if the enhancements make a tangible difference in real-world applications. Here are the key areas we’ll focus on: copywriting, creative problem-solving, storytelling, drafting work emails, creating blog content, outside-the-box thinking, and even speed.
A few things to note, though, before we get started. First, at the moment, I only have access to text-based GPT-4o, which makes me a little sad. I don’t have the official Mac OS desktop app, the new voice mode, upgraded DALL-E, or anything else. So, some might think this battle is a premature ejaculation, but trust me – it will be better than you think. There is so much to uncover that can be tested. Let’s party.
Oh, and one more thing that needs to be said before we get started: GPT-4o takes the crown from GPT-4 as the worst product name in history. So, it already has one win under its belt.
1. Storytelling – Can GPT-4o Tell a Better Tale?
Let’s see how well GPT-4o can craft compelling and creative short stories. This test will look at the model’s ability to generate engaging narratives, develop characters, and create immersive plots that suck people in.
I like to pick on Dick. Dick Smith is my Creative Director and a total loser. The good news is he knows this. Let’s see how well GPT-4o and GPT-4 can create a little story about Dick doing something he hasn’t done in over a decade: go on a date.
GPT4o
GPT-4
Now that I have you on the edge of your seat and begging for more, read the full stories. Sorry, adding a million screenshots to this page just isn’t going to work out, okay? Learn to like it.
As you can see from the story, GPT-4o did a much better job. It did a better job fleshing out Dick’s character, creating genuinely funny content, and text that is just overall more human-sounding. I would rate GPT-4o’s output as at least 30% better.
I showed this to Dick, and he immediately challenged me to a sword fight. Let’s hope he’s doing okay, but I haven’t checked in days.
Score: GPT-4o: 1 and GPT-4: 0
2. Marketing Copywriting – GPT-4 Swings for The Fences
Next, we’ll put its copywriting skills to the test. If you’re not familiar with the term “copywriting,” it just means “writing that can be used in marketing material”.
This test is all about assessing how well it can produce persuasive and engaging writing for marketing purposes. In typical Run The Prompts fashion, I threw the machine a 90 MPH curveball. Take a look at this.
GPT-4o
GPT-4
I must say, the copy that GPT-4 produced in this example was near-perfect. It’s creative, used metaphors, wasn’t cliché at all, and was just an overall great salesman.
GPT-4o, on the other hand, just wasn’t as smooth. It was more cliche, less imaginative, and more reminiscent of intern-level copy. This is surprising, and I was expecting the opposite. It will take a lot more testing to see if this changes, but I’m shocked to the core at this result. Maybe GPT-4o will get tweaked/updated to correct this.
By the way, if you’re looking for the BEST possible copywriting solution, check out our GPT: Digital Marketing Copywriter Pro. With over 1,000 chats and 40+ ratings, along with a 4.7-star rating, it’s one of our most popular GPTs and will make all your copywriting dreams come true.
Score: GPT-4o: 1 and GPT-4: 1
3. Creative Problem Solving – Which GPT Has the Key?
One of the critical capabilities we’re interested in is ChatGPT’s ability to solve problems. Not just any problem, though… a problem that any Run The Prompts fan can relate to (you’ll see what I mean in a second).
We challenged GPT-4o and GPT-4 with a prompt that requires innovative and unconventional thinking.
Let’s see what happened!
GPT-4o
GPT-4
As you can see, GPT-4o won easily. Nearly all of GPT-4’s suggestions required getting up from the couch. Not cool!
GPT-4o listened to that subtlety in the question and mostly produced suggestions that did not require me to get up from the couch. Nice! Plus, it had better and more well-thought-out suggestions than GPT-4.
So, if you find yourself in a bit of a pickle like this example, be sure to use GPT-4o. It just might save your life.
Score: GPT-4o: 2 and GPT-4: 1
4. Creative Thinking – Life Hacks, Courtesy of GPT-4o
I know. “Life Hacks” is a bit of an outdated phrase at this point, but whatever. Fun fact: Lifehacker.com still exists! The nostalgia.
Anyway, life hacks are fun, often cheap, and creative ways to solve everyday problems. Let’s see if ChatGPT can think of some that don’t exist yet. Let’s take a look.
GPT-4o
GPT-4o
Okay, so GPT-4 apparently doesn’t know what a life hack is. The ideas it gave are not life hacks; they’re product ideas.
As for GPT-4o, it pumped out some good life hacks. I mean, I’m not going to use any of them, but they are creative ideas. However, I don’t know if they already exist. I didn’t check. I don’t have time for that sh*t!
GPT-4o was the clear winner here in every way. It wasn’t even close.
Score: GPT-4o: 3 and GPT-4: 1
5. GPT-4 vs. GPT-4o: Who Writes Better Work Emails?
On occasion, people have to email their coworkers but don’t know the right words to use or maybe just want to save some precious time. ChatGPT has been helping with this issue since 2022, and it’s not exactly a creative or interesting use case anymore.
But what about an email to help you call into work? Wait. I mean help you email that you’re calling into work. Right.
Sometimes, you might need to call (email!) into work because you need a mental health day and want a good excuse as to why you can’t show up. I’m here to rescue you from that dilemma.
GPT-4o
GPT-4
As you can see, GPT-4o created the perfect email describing your new huge embarrassing failure. It’s literally the kind of thing you could copy and paste and smash send to. On a serious note, it is perfect, but yeah, you should remove the obvious phrase at the end of the first paragraph unless you want your next workday to include an awkward date in the HR department.
GPT-4’s email was overly professional, even though I specifically told it not to be. It wasn’t terrible, but it wasn’t as smooth and natural-sounding as GPT-4o.
Are you starting to notice a pattern yet? I realize that it’s a repetitive beatdown at this point and GPT-4 is getting bloodied and slapped around, but just stick with me because I’m sure you’ll love the rest of the article.
Score: GPT-4o: 4 and GPT-4: 1
6. Blog Writing – Because Nobody Has Time for That Sh*t
Let’s examine its performance in generating blog content. This involves testing its ability to write informative, engaging, human-sounding, and well-structured content that could be used on a blog.
Don’t worry, though. Here at Run The Prompts, we use AI as a supplement, not a substitute. Rest assured that we will NOT spam our website with 100% (or even 50%) AI-generated content. Neither should you.
For the test, I just gave it the barking order of creating an intro paragraph. I did that just for you because I know you don’t want to read more than one paragraph.
GPT-4o
GPT-4
They’re both good but for different reasons.
GPT-4o was a lot more human-sounding and concise. GPT-4 was more descriptive and catchy. I mean, did you see that line where it said, “A snowman in Compton”? That’s amazing.
Let’s give this one a tie. Both earn one point.
Score: GPT-4o: 5 and GPT-4: 2
7. Speed Test – Is GPT-4o Really 4X Faster?
OpenAI claims that GPT-4o is around four times faster than GPT-4. Is it really, though?
Although I can’t demonstrate my test through words and images, you’ll just have to trust me. Here are the results.
I had ChatGPT write three paragraphs for me using both models. I used the Stopwatch app on my phone and got to work. This was tested multiple times, and I used the averages.
Time to create three paragraphs (same topic):
GPT-4o: 4.58 seconds.
GPT-4: 14.81 seconds.
GPT-4o was 3.2 times faster. It is blazing fast. Even faster than our Marketing Assistant/Diversity Hire, Bear.
It wrote one paragraph every 1.52 seconds. Yikes. There goes the conspiracy theory that ChatGPT is just an overseas call center in India filled with people who can type fast. Because nobody is that fast.
Wrapping it up – GPT-4o vs. GPT-4
Our deep dive into GPT-4 vs. GPT-4o gave us some cool insights. While there were a few surprises, GPT-4o came out on top in six out of seven of the tests.
Whether it was spinning stories, solving problems creatively, or even writing awkward work emails, GPT-4o showed it’s got the edge and muscles for a beatdown.
GPT-4 did shine in some areas like marketing copy, but overall, GPT-4o’s speed and natural-sounding, and well-thought-out answers were hard to beat.
So, if you’re a writer or marketer, or just need some help with everyday tasks, GPT-4o is the upgrade you’ve been waiting for.
Stay tuned as we keep exploring what GPT-4o can do! In the meantime, be sure to share this article with all of your current, past, and potential coworkers. You’ll thank me later.
Disclaimer: Dick Smith is not a real person. He is an AI-generated character that only exists in the warped imagination of his creator: Nick Smith.
If you like ChatGPT, you'll love Venice. Venice is private and uncensored! Try Venice today for free or get 20% off for a limited time with promo code "RUN20".