Transformers are RNNs: A Kernel Perspective
In this article, I will argue that transformers are RNNs. I will not merely point out formal containment; I will also explain how similar transformers and RNNs are, and what practical implications follow from that viewpoint. Transformers are RNNs First, I will show that transformers are RNNs. To put it simply, a transformer is an […]
Transformers are RNNs: A Kernel Perspective Read Post »










