What is Quantization?

Optimization

Quantization — The process of reducing the precision of model weights (e.g., from 16-bit to 4-bit) to decrease model size and increase inference speed, with minimal quality loss.

FAQ

What is quantization?

Reducing model weight precision (16-bit to 4-bit) to make models smaller and faster with minimal quality loss.

Does quantization hurt quality?

At 4-bit (Q4_K_M), quality loss is typically 1-3% on benchmarks. At 8-bit, it is nearly imperceptible.

Related Terms

Learn Quantization in depth

Free hands-on course with code examples and Google Colab notebooks.

Start Course →