Current “thinking” AI models still can’t reason to a level that would be expected from humanlike artificial general intelligence, the researchers found.
The race to develop artificial general intelligence (AGI) still has a long way to run, according to Apple researchers who found that leading AI models still have trouble reasoning.
Recent updates to leading AI large language models (LLMs) such as OpenAI’s ChatGPT and Anthropic’s Claude have included large reasoning models (LRMs), but their fundamental capabilities, scaling properties, and limitations “remain insufficiently understood,” said the Apple researchers in a June paper called “The Illusion of Thinking.”
They noted that current evaluations primarily focus on established mathematical and coding benchmarks, “emphasizing final answer accuracy.”