Apple just released a very interesting paper showing that large language models like ChatGPT can’t reason.
What they did was pretty funny.
LLMs have been getting better and better at passing basic math tests like the GSM (Grade School Math) test.
But Apple just showed that the AI isn’t actually “thinking”.
Instead, the LLMs are basically just pattern-matching.
They are acting like students who are not smart, but were forced to memorize many things.
Teachers might think they are smart at first because they know so much, but in fact, they are just regurgitating what they learned.
I remember when I was in school, sometimes the math teacher would give us the previous year’s test to practice.
But they always said they would “change the numbers to protect the innocent” when the real test came.
Imagine that the teacher gave the exact same math test every year.
Then students could easily get 100% just by memorizing the answers from the year before.
This would obviously not mean that a student was good in math if they could just memorize all the answers.
👉 Sign up to our free 5-Day email course to grow 🚀 and earn💲in the AI age
The funny thing is, this is what Apple showed that AI is essentially now doing.
Imagine this simple math problem.
Bob has 5 apples and Sara has 8 apples. How many apples do they have in total?
Well, any LLM like ChatGPT could easily say 8 apples.
But what if I changed the names and the fruits?
Steve has 6 peaches and Cindy has 9 peaches. How many peaches do they have in total?
This question has the same difficulty as the previous one, but Apple showed that changing the names of the people and the fruits made AI perform worse on these questions.
Note: This question is too easy for ChatGPT to fail at, but I’m just showing a simple example to make you understand the point.
The other thing that Apple did was add extra and unnecessary information to the question.
For example, they would now ask ChatGPT this:
Steve has 6 peaches and Cindy has 9 peaches. Half of the peaches are small and half of the peaches are large. How many peaches do they have in total?
This new sentence, “half of the peaches are small and half of the peaches are large.” doesn’t add any useful information to the question.
Who cares about the size of the peaches?
Well, Apple showed that adding this type of information made the LLMs perform far worse than before.
AI got confused by this extra information.
This is exactly the type of thing that would confuse a student who was memorizing their homework instead of understanding it.
The thing is that LLMs can memorize A LOT more than any student can.
It is easy for them to trick us humans into making it look like they are thinking.
Apple has proven that they aren’t thinking, and this is interesting in and of itself.
However, it might not matter if they can’t “think”.
Is it possible that they get so good at pattern matching, that they still end up being “smarter” than humans at everything?
Maybe.
For example, AI can beat every human at chess and Go. It doesn’t matter that the AI isn’t “thinking” in the ways that we are.
AI is still better than us.
Or, will this lack of thinking put a ceiling on how good they can become?
Do we need another AI breakthrough to get AI to really “think”?
👉 Tell me what you think in the comments! 👈
Note:
If you want our team to create a custom AI chatbot for your business or website, you can contact me ✉️ here and I’ll get back to you quickly:
👉 Sign up to our free 5-Day email course to grow 🚀 and earn💲in the AI age
*** Follow me on LinkedIn
👉 Sign up for our newsletter to get the latest tips on how to use ChatGPT and boost your productivity.
Check out our new YouTube Channel
Follow us at our website: AI Growth Guys
The Source of the mentioned Apple Study (preprint) is http://arxiv.org/pdf/2410.05229v1
Any source for the mentioned research from Apple?