Stored key-value pairs from previous tokens during generation. Avoids recomputing attention for all prior tokens at each step.

Why does KV cache matter?

Without it, generating each token would require reprocessing the entire sequence. KV cache makes generation O(1) per token instead of O(n).

What is KV Cache? — KV Cache Explained