A GPU quantization method that compresses LLMs to 4-bit with minimal quality loss using second-order optimization.

AWQ is generally faster and slightly better quality. GPTQ was the earlier standard but AWQ has largely replaced it.

What is GPTQ? — GPTQ Explained