What is GPTQ?

Quantization

GPTQ — A post-training quantization method for LLMs that uses approximate second-order information to minimize quantization error. Primarily used for GPU inference at 4-bit precision.

FAQ

What is GPTQ?

A GPU quantization method that compresses LLMs to 4-bit with minimal quality loss using second-order optimization.

GPTQ vs AWQ?

AWQ is generally faster and slightly better quality. GPTQ was the earlier standard but AWQ has largely replaced it.

Related Terms

Learn GPTQ in depth

Free hands-on course with code examples and Google Colab notebooks.

Start Course →