I put GPT-4’s “Advanced Reasoning” to the Test

Tim Hanewich
2 min readSep 26, 2023

--

Since its release in March 2023, OpenAI’s latest publicly available large language model, GPT-4, has been hailed as the more capable successor to their famed GPT-3.5 model that powers ChatGPT.

OpenAI and others tout GPT-4’s “advanced reasoning” and “creativity” capabilities as being levels beyond that of GPT-3 or GPT-3.5, even putting GPT-4 behind a paywall, exclusive to Plus users.

But is GPT-4 really that much better? I pitched it against GPT-3.5 in a simple logic test to see if the “advanced reasoning” claims are true.

The Reasoning Test

I will ask GPT-3.5 and GPT-4 the following question:

Both Andrew and Sally are starting from the same position. Sally will travel north at a speed of 20 miles per hour. Andrew will travel east at a speed of 50 miles per hour. How much time must elapse before Sally and Andrew are exactly 25 miles apart?

I asked this question to GPT-3.5 and GPT-4 with the following specifications:

The Results

GPT-3.5 and GPT-4 gave completely different answers. The graphic above illustrates both responses, but I will provide them as text below too.

GPT-3.5 responded as follows:

And GPT-4 responded as follows:

In addition to being more verbose, providing more helpful information about the problem, and solving more methodically, GPT-4 was absolutely correct while GPT-3.5 was incorrect.

I validated GPT-4’s response on paper:

Summary

Judging on the results of this simple test, it appears GPT-4 does indeed possess “advancing reasoning” capabilities, at least compared to its predecessor GPT-3.5.

--

--