What is vLLM used for?

Serving LLMs in production with high throughput. It handles multiple concurrent requests efficiently using PagedAttention and continuous batching.

Yes, open-source under Apache 2.0 license.

vLLM is for multi-user production serving (high throughput). Ollama is for single-user local inference (ease of use).

What is vLLM? — vLLM Explained