What is Quantization?
Optimization
Quantization — The process of reducing the precision of model weights (e.g., from 16-bit to 4-bit) to decrease model size and increase inference speed, with minimal quality loss.
FAQ
What is quantization?
Reducing model weight precision (16-bit to 4-bit) to make models smaller and faster with minimal quality loss.
Does quantization hurt quality?
At 4-bit (Q4_K_M), quality loss is typically 1-3% on benchmarks. At 8-bit, it is nearly imperceptible.
Related Terms
Learn Quantization in depth
Free hands-on course with code examples and Google Colab notebooks.
Start Course →