I put GPT-4’s “Advanced Reasoning” to the Test
Since its release in March 2023, OpenAI’s latest publicly available large language model, GPT-4, has been hailed as the more capable successor to their famed GPT-3.5 model that powers ChatGPT.
OpenAI and others tout GPT-4’s “advanced reasoning” and “creativity” capabilities as being levels beyond that of GPT-3 or GPT-3.5, even putting GPT-4 behind a paywall, exclusive to Plus users.
But is GPT-4 really that much better? I pitched it against GPT-3.5 in a simple logic test to see if the “advanced reasoning” claims are true.
The Reasoning Test
I will ask GPT-3.5 and GPT-4 the following question:
Both Andrew and Sally are starting from the same position. Sally will travel north at a speed of 20 miles per hour. Andrew will travel east at a speed of 50 miles per hour. How much time must elapse before Sally and Andrew are exactly 25 miles apart?
I asked this question to GPT-3.5 and GPT-4 with the following specifications:
The Results
GPT-3.5 and GPT-4 gave completely different answers. The graphic above illustrates both responses, but I will provide them as text below too.
GPT-3.5 responded as follows:
And GPT-4 responded as follows:
In addition to being more verbose, providing more helpful information about the problem, and solving more methodically, GPT-4 was absolutely correct while GPT-3.5 was incorrect.
I validated GPT-4’s response on paper:
Summary
Judging on the results of this simple test, it appears GPT-4 does indeed possess “advancing reasoning” capabilities, at least compared to its predecessor GPT-3.5.