What is llama.cpp?

Local AI

llama.cpp — A C/C++ implementation for running LLM inference on consumer hardware. Created by Georgi Gerganov, it enables running models on CPUs and Apple Silicon with minimal dependencies.

FAQ

What is llama.cpp?

A lightweight C++ library for running LLMs on CPUs and Apple Silicon. It created the GGUF format and powers Ollama.

Why use llama.cpp?

Maximum control over inference: custom quantization, server mode, batch processing, grammar-constrained generation.

Related Terms

Learn llama.cpp in depth

Free hands-on course with code examples and Google Colab notebooks.

Start Course →