A binary format for quantized LLMs that lets you run models like Llama and Mistral on your laptop without a GPU.

Which GGUF quantization is best?

Q4_K_M for most users (good quality, small size). Q5_K_M for better quality. Q8_0 for near-original.

Can I run GGUF on Mac?

Yes. GGUF runs excellently on Apple Silicon with Metal acceleration via llama.cpp or Ollama.