What is a transformer?

A neural network architecture that uses attention mechanisms to process sequences in parallel. It powers all modern LLMs.

Why are transformers better than RNNs?

Transformers process all tokens in parallel (fast training), capture long-range dependencies better, and scale to billions of parameters.