Supervised fine tuning of a large language model using quantized low rank adapters
Fine-tuning of a large language model (LLM) can be peformed using QLoRA (Quantized Low Rank Adapters) and PEFT (Parameter-Efficient Fine-Tuning) techniques.
PEFT (Parameter-Efficient Fine-Tuning):
- PEFT is a technique for fine-tuning large language models with a small number of additional parameters, known as adapters, while freezing the original model parameters.
- It allows for efficient fine-tuning of language models, reducing the memory footprint and computational requirements.
- PEFT enables the injection of niche expertise into a foundation model without catastrophic forgetting, preserving the original model’s performance.
LoRA (Low Rank Adapters):
- LoRA is a technique that introduces low-rank adapters for fine-tuning large language models, allowing for efficient backpropagation of gradients through a frozen, quantized pretrained model.
- It involves configuring parameters such as attention dimension, alpha parameter for scaling, dropout probability, and task type for the language model.
- LoRA aims to reduce memory usage and computational requirements during fine-tuning, making it possible to train large models on a single GPU while preserving performance.
These techniques, when combined, enable the efficient fine-tuning of large language models, making the process more accessible and resource-efficient for researchers and practitioners.
For more information on LoRA refer to: https://arxiv.org/abs/2305.14314
For a code example refer to: https://github.com/jamesdhope/LLM-fine-tuning/blob/main/tuning.py
Code Attribution: Maxime Labonne