A technique to fine-tune LLMs cheaply by only training small adapter matrices instead of all model weights.

LoRA vs full fine-tuning?

LoRA uses 10,000x fewer parameters, trains in hours instead of days, and produces models nearly as good as full fine-tuning.

LoRA applied to a 4-bit quantized base model. Even more memory efficient — fine-tune a 65B model on a single GPU.

What is LoRA? — LoRA (Low-Rank Adaptation) Explained