What is KV Cache?
Inference
KV Cache — A memory optimization for transformer inference that stores previously computed key-value pairs so they do not need to be recomputed for each new token. Critical for fast autoregressive generation.
FAQ
What is KV cache?
Stored key-value pairs from previous tokens during generation. Avoids recomputing attention for all prior tokens at each step.
Why does KV cache matter?
Without it, generating each token would require reprocessing the entire sequence. KV cache makes generation O(1) per token instead of O(n).
Related Terms
Learn KV Cache in depth
Free hands-on course with code examples and Google Colab notebooks.
Start Course →