What is llama.cpp?
Local AI
llama.cpp — A C/C++ implementation for running LLM inference on consumer hardware. Created by Georgi Gerganov, it enables running models on CPUs and Apple Silicon with minimal dependencies.
FAQ
What is llama.cpp?
A lightweight C++ library for running LLMs on CPUs and Apple Silicon. It created the GGUF format and powers Ollama.
Why use llama.cpp?
Maximum control over inference: custom quantization, server mode, batch processing, grammar-constrained generation.
Related Terms
Learn llama.cpp in depth
Free hands-on course with code examples and Google Colab notebooks.
Start Course →