What is KV Cache?

Inference

KV Cache — A memory optimization for transformer inference that stores previously computed key-value pairs so they do not need to be recomputed for each new token. Critical for fast autoregressive generation.

FAQ

What is KV cache?

Stored key-value pairs from previous tokens during generation. Avoids recomputing attention for all prior tokens at each step.

Why does KV cache matter?

Without it, generating each token would require reprocessing the entire sequence. KV cache makes generation O(1) per token instead of O(n).

Related Terms

Learn KV Cache in depth

Free hands-on course with code examples and Google Colab notebooks.

Start Course →