LoRA fine-tuning applied to a 4-bit quantized model. Fine-tune 65B models on a single GPU.

QLoRA uses less memory (4-bit base model) with similar quality. LoRA keeps the base in 16-bit for slightly better results.

What is QLoRA? — QLoRA (Quantized LoRA) Explained