What is Transformers?

Deep Learning

Transformers — A neural network architecture based on self-attention mechanisms. The foundation of all modern LLMs (GPT, BERT, Llama, Mistral). Introduced in the 2017 paper Attention Is All You Need.

FAQ

What is a transformer?

A neural network architecture that uses attention mechanisms to process sequences in parallel. It powers all modern LLMs.

Why are transformers better than RNNs?

Transformers process all tokens in parallel (fast training), capture long-range dependencies better, and scale to billions of parameters.

Related Terms

Learn Transformers in depth

Free hands-on course with code examples and Google Colab notebooks.

Start Course →