A tool that lets you run LLMs locally with one command. It handles model downloading, quantization, and serving automatically.

Ollama is easier (one command). llama.cpp gives more control (custom quantization, server options). Ollama uses llama.cpp under the hood.

Yes. It automatically uses Apple Metal on Mac, and NVIDIA CUDA on Linux/Windows.

What is Ollama? — Ollama Explained