Data Processing Club

How LLMs Feel Language

Cognitive semantics is a linguistic approach that understands the meaning of words in relation to human cognition and ways of perceiving things. In this article, I would like to talk about the “cognition” of LLMs from the perspective of cognitive semantics. In the background of cognitive semantics are fields such as Gestalt psychology and cognitive […]

How LLMs Feel Language Read Post »

How to Survive the AI Revolution

I think a lot of people feel a real sense of crisis about AI taking their jobs. I feel it too. In fact, I am in the almost absurd, joke-like situation where I was supposed to be working on building AI, and now even that very job looks as though it may itself be taken

How to Survive the AI Revolution Read Post »

Model Merge Explained: Build Models without Training

We can flatten the parameters of a deep neural network into a single vector. For a large model, this vector can easily have tens of billions of dimensions. At first glance it may look like a meaningless list of numbers, but it has become clear that this vector has deep meaning as a vector. For

Model Merge Explained: Build Models without Training Read Post »

Why Your Brain Matters in the Era of AI: You Can’t Outsource Intuition

Over the past few years, LLMs have developed rapidly, significantly expanding our cognitive capabilities. Many people probably now turn to AI first when they encounter a problem. It may be that LLMs already know more than my brain does, and perhaps possess higher reasoning abilities than my brain. However, in this article, I will discuss

Why Your Brain Matters in the Era of AI: You Can’t Outsource Intuition Read Post »

Transformers are RNNs: A Kernel Perspective

In this article, I will argue that transformers are RNNs. I will not merely point out formal containment; I will also explain how similar transformers and RNNs are, and what practical implications follow from that viewpoint. Transformers are RNNs First, I will show that transformers are RNNs. To put it simply, a transformer is an

Transformers are RNNs: A Kernel Perspective Read Post »

Sorting with LLMs

Sorting is a classic task in computer science, but it has recently intersected with state-of-the-art LLMs and sparked a new research trend. Sorting can be executed as long as you define a comparison function. Traditional comparison functions assumed measurable numeric quantities such as height, price, or distance. However, if we call an LLM inside the

Sorting with LLMs Read Post »

Speculative Decoding Explained

“Generating…”We are all familiar with the cursor blinking, slowly revealing text word by word. Whether it’s ChatGPT or a local LLM, the generation speed of autoregressive models based on Transformers is fundamentally bound by their sequential nature. It computes, emits one token, computes again, emits the next token, and repeats. This strict dependence chain introduces

Speculative Decoding Explained Read Post »

Even GPT-5.2 Can’t Count to Five

In this post, we discuss how state-of-the-art large language models still make mistakes on extremely simple problems based on Even GPT-5.2 Can’t Count to Five: The Case for Zero-Error Horizons in Trustworthy LLMs. For concrete examples: if you ask whether the number of 1s in 11000 is even or odd, gpt-5.2-2025-12-11 answers “odd”. If you

Even GPT-5.2 Can’t Count to Five Read Post »

One Training Example Is All You Need for Reasoning

To improve reasoning ability, it may be enough to use only one training example in the post-training of an LLM. In this post, I explain a study on reinforcement learning that uses just a single training example, “Reinforcement Learning for Reasoning in Large Language Models with One Training Example” [Wang+ NeurIPS 2025]. Intuitively, the main

One Training Example Is All You Need for Reasoning Read Post »

How LLMs Really Do Arithmetic

LLMs can answer prompts like “226-68=” by outputting “158”, but it turns out that this computation is carried out in a much stranger way than we might imagine, as shown by [Nikankin+ ICLR 2025]. Let us first confirm the assumptions. We do not use chain-of-thought. We consider the setting where the model directly outputs an

How LLMs Really Do Arithmetic Read Post »