What is continuous batching?

Dynamically inserting new requests as others finish, keeping the GPU fully utilized at all times. Used by vLLM and TGI.

What is Batch Inference? — Batch Inference Explained