What is Transformers?
Deep Learning
Transformers — A neural network architecture based on self-attention mechanisms. The foundation of all modern LLMs (GPT, BERT, Llama, Mistral). Introduced in the 2017 paper Attention Is All You Need.
FAQ
What is a transformer?
A neural network architecture that uses attention mechanisms to process sequences in parallel. It powers all modern LLMs.
Why are transformers better than RNNs?
Transformers process all tokens in parallel (fast training), capture long-range dependencies better, and scale to billions of parameters.
Related Terms
Learn Transformers in depth
Free hands-on course with code examples and Google Colab notebooks.
Start Course →