Comparing the reasoning skills of GPT-4 to GPT-3.5

Tram Ho

I got my hands on the new model and did some experiments.

When I logged into ChatGPT Plus, I was welcomed with a big smile and a friendly message saying “Hey there! OpenAI’s new GPT-4 is so smart it can do all kinds of tricky thinking stuff!”

The greeting when I opened ChatGPT Plus today

When I clicked on the link, I saw some cool stuff about the three different models they had to offer: Legacy, Turbo (also known as Default) and GPT-4. It was like a comparison chart!

This is the original model released as ChatGPT

This was originally known as the “Turbo” model, but became the default based on user feedback. It’s more concise and much faster than the original model

This is the newest model: GPT-4. Here speed is sacrificed for “advanced reasoning, complex instruction understanding, and more creativity”

I was excited to compare the new Turbo model’s intelligence to the old one and find out how much better it is!

I asked two models some questions to see how smart they were. The first one was a tricky one about family, the second was a riddle, and the third was like something a salesman would ask. Let’s see if they can figure it out!

Here are the results:

Question #1: Wolf, Chicken, and Feed Riddle

The default model got this hilariously wrong.

GPT-4 got it right.

This riddle is easy for most people to solve, but GPT-3.5 gave a confusing answer. However, GPT-4 was able to solve the riddle correctly, giving the right steps in the right order.

Question #2: Traveling Salesman

GPT-3.5 used the Nearest Neighbor Algorithm. It got the right result with the algorithm, but this is not the actual shortest path for the salesman

I tried to force GPT-3.5 to brute-force the answer, but it still got it wrong.

GPT-4 successfully solved the traveling salesman problem for five cities.

Even though there were only five cities, there were 24 possible routes, making it an NP-hard problem. GPT-3.5 used the Nearest Neighbor Algorithm, which gave the wrong answer because it wasn’t the shortest possible path. I asked it to use the brute-force approach, but it still gave the wrong answer.

GPT-4 was able to solve the traveling salesman problem by using a method called brute-force, which means it looked at all 24 possible routes and found the correct one.

Question #3: Family Relationships

The default model got this wrong and its answer is very confusing

GPT-4 got it wrong too, but at least its reasoning was better

I was so confused by this question that even the advanced artificial intelligence programs GPT-3.5 and GPT-4 couldn’t get it right. However, the correct answer is that my two friends are related as first cousins once removed.

GPT-4 is still making mistakes, but they are much less noticeable than the mistakes made by GPT-3.5. It is amazing that this model can do so much with probability calculations.

I will look at how well GPT-4 can do coding tasks once I have some good tasks for it to do. I will let you know when I have done this.

And Finally

As always, I hope you enjoyed this article and learned something new.
Thank you and see you in the next articles!

Resource

Share the news now

Source : Viblo