A lightweight C++ library for running LLMs on CPUs and Apple Silicon. It created the GGUF format and powers Ollama.

Maximum control over inference: custom quantization, server mode, batch processing, grammar-constrained generation.

What is llama.cpp? — llama.cpp Explained