


In recent months, the artificial intelligence community has been through a heated debate sparked by two influential research papers published by Apple. The first, "GSM-Symbolic" (October 2024), and the second, "The Illusion of Thinking" (June 2025), questioned the alleged reasoning capabilities of Large Language Models, sparking mixed reactions across the industry.
As analyzed in our previous in-depth study on "The illusion of progress: simulating general artificial intelligence without achieving it", the issue of artificial reasoning touches the very heart of what we consider intelligence in machines.
Apple researchers conducted a systematic analysis on Large Reasoning Models (LRM) -those models that generate detailed reasoning traces before providing an answer. The results were surprising and, for many, alarming.
The study subjected the most advanced models to classical algorithmic puzzles such as:

The results showed that even small changes in problem formulation lead to significant changes in performance, suggesting worrying fragility in reasoning. As reported in the AppleInsider's coverage, "performance of all models decreases when only numerical values in the GSM-Symbolic benchmark questions are altered."
The response from the AI community was not long in coming. Alex Lawsen of Open Philanthropy, in collaboration with Claude Opus of Anthropic, published a detailed rebuttal entitled. "The Illusion of the Illusion of Thinking", challenging methodologies and conclusions of the Apple study.
When Lawsen repeated the tests with alternative methodologies - asking models to generate recursive functions instead of listing all the moves - the results were dramatically different. Models such as Claude, Gemini and GPT correctly solved Tower of Hanoi problems with 15 records, far beyond the complexity where Apple reported zero successes.
Gary Marcus, a longtime critic of LLMs' reasoning skills, embraced the Apple results as confirmation of his 20-year thesis. According to Marcus, LLMs continue to struggle with "distribution shift"-the ability to generalize beyond training data-while remaining "good solvers of problems already solved."
The discussion has also spread to specialized communities such as LocalLlama on Reddit, where developers and researchers debate the practical implications for open-source models and local implementation.
This debate is not purely academic. It has direct implications for:
As highlighted in several technical insights, the need for hybrid approaches that combine:
Trivial example: an AI assistant helping with accounting. The language model understands when you ask "how much did I spend on travel this month?" and extracts the relevant parameters (category: travel, period: this month). But the SQL query that queries the database, calculating the sum and checking the fiscal constraints? That is done by deterministic code, not the neural model.
It has not escaped observers' notice that the Apple paper was released just before WWDC, raising questions about strategic motivations. As noted in the9to5Mac's analysis, "the timing of the Apple paper-just before WWDC-raised a few eyebrows. Was this a research milestone, or a strategic move to reposition Apple in the broader AI landscape?"
The debate arising from the Apple papers reminds us that we are still in the early stages of understanding artificial intelligence. As pointed out in our previous article, the distinction between simulation and authentic reasoning remains one of the most complex challenges of our time.
The real lesson is not whether or not LLMs can "reason" in the human sense of the term, but rather how we can build systems that exploit their strengths while compensating for their limitations. In a world where AI is already transforming entire sectors, the question is no longer whether these tools are "smart," but how to use them effectively and responsibly.
The future of enterprise AI will probably lie not in a single revolutionary approach, but in the intelligent orchestration of several complementary technologies. And in this scenario, the ability to critically and honestly evaluate the capabilities of our tools becomes a competitive advantage itself.
For insights into your organization's AI strategy and implementation of robust solutions, our team of experts is available for customized consultations.