January 2026

Even GPT-5.2 Can’t Count to Five

In this post, we discuss how state-of-the-art large language models still make mistakes on extremely simple problems based on Even GPT-5.2 Can’t Count to Five: The Case for Zero-Error Horizons in Trustworthy LLMs. For concrete examples: if you ask whether the number of 1s in 11000 is even or odd, gpt-5.2-2025-12-11 answers “odd”. If you […]

Even GPT-5.2 Can’t Count to Five Read Post »

How LLMs Really Do Arithmetic

LLMs can answer prompts like “226-68=” by outputting “158”, but it turns out that this computation is carried out in a much stranger way than we might imagine, as shown by [Nikankin+ ICLR 2025]. Let us first confirm the assumptions. We do not use chain-of-thought. We consider the setting where the model directly outputs an

How LLMs Really Do Arithmetic Read Post »

Scroll to Top