Local Fine Tuning#

This is notebook is part of the Fine Tuning in 5, 15, 50 minutes guide as part of the https://ravinkumar.com/GenAiGuidebook/

This notebook was run on a local desktop with an RTX 4090 GPU.

import os
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments
import torch
from peft import LoraConfig, get_peft_model
from datasets import load_dataset

Download model#

This grabs the weights from the Hugging Face server.

os.environ['HF_TOKEN'] = HF_TOKEN
#model_id = "google/gemma-7b-it"
model_id = "mistralai/Mistral-7B-Instruct-v0.2"

Set quantization configuration#

To tune a model with a 4090 it won’t fit in memory with higher precision. So we use reduce the size of the model in memory by with 4 bit quantization.

# bnb_config = BitsAndBytesConfig(
#     load_in_4bit=False,
#     bnb_4bit_quant_type="nf4",
#     bnb_4bit_compute_dtype=torch.bfloat16
# )

bnb_config = BitsAndBytesConfig(

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config,
                                             device_map={"":0}, token=os.environ['HF_TOKEN'])

Initial Generation#

text = "What is a recipe for eggs?"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)
tokenizer = AutoTokenizer.from_pretrained(model_id, token=os.environ['HF_TOKEN'])
tokenizer.padding_side = 'right'
outputs = model.generate(**inputs, max_new_tokens=1000)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
What is a recipe for eggs?

A basic recipe for cooking eggs could be:

1. Heat a small amount of oil or butter in a pan over medium heat.
2. Crack one or more eggs into the pan.
3. Cook until the whites are set but the yolks are still runny, or cook until the eggs are fully set to your liking.
4. Season with salt, pepper, or other desired spices.
5. Serve and enjoy!

There are many variations of this basic recipe, such as making an omelette, scrambling the eggs, or making a poached or fried egg. The cooking method and additional ingredients can be adjusted to suit personal preferences.

Load Training Data#

These are the examples generated using the distillation notebook.

data = load_dataset("json", data_files="gemmaresponses_20.jsonl")
    train: Dataset({
        features: ['id', 'food', 'prompt', 'rewrite_prompt', 'completion'],
        num_rows: 20
'What is a recipe for Apple Slices with Cinnamon?'

  "apples": 2,
  "cinnamon": 1/2 teaspoon,
  "butter": 1/4 cup,
  "sugar": 1/2 cup,
  "lemon juice": 1 tablespoon


"Gather thy apples, fair and ripe,
And sprinkle cinnamon upon their slice,
With butter soft and golden spread,
And sugar sweet to sweeten the spread.
Add lemon juice, a touch of zest,
And bake in oven, till they rest
In golden glory, a delight,
To fill thy mouth with pure delight."

Perform Supervised Fine Tuning#

We need to first format the distillation examples to fit the Mistral format. After that we can set the LORA config and other parameters. In particular hte learning rate and max steps are critical.

def formatting_func(example):
    output_texts = []
    for i in range(len(example['prompt'])):
        text = f"<|system|></s><|user|>{example['rewrite_prompt'][i]}<|assistant|>{example['completion'][i]}"
    return output_texts

['<|system|></s><|user|>What is a recipe for Apple Slices with Cinnamon?<|assistant|>**Ingredients:**\n\n```json\n{\n  "apples": 2,\n  "cinnamon": 1/2 teaspoon,\n  "butter": 1/4 cup,\n  "sugar": 1/2 cup,\n  "lemon juice": 1 tablespoon\n}\n```\n\n**Instructions:**\n\n"Gather thy apples, fair and ripe,\nAnd sprinkle cinnamon upon their slice,\nWith butter soft and golden spread,\nAnd sugar sweet to sweeten the spread.\nAdd lemon juice, a touch of zest,\nAnd bake in oven, till they rest\nIn golden glory, a delight,\nTo fill thy mouth with pure delight."']
output_dir = "output_model"

lora_config = LoraConfig(
    target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],

trainer = SFTTrainer(
train_result = trainer.train()
[20/20 00:19, Epoch 4/4]
Step Training Loss
1 2.090600
2 1.885000
3 1.533900
4 1.368500
5 1.258400
6 0.903900
7 0.724800
8 0.856100
9 0.857100
10 0.746600
11 0.402100
12 0.668800
13 0.376500
14 0.455100
15 0.434900
16 0.225700
17 0.348400
18 0.263800
19 0.277600
20 0.285400

Final Generation#

After finetuning here’s what the final generation looks like.

text = "What is a recipe for eggs?"
device = "cuda:0"
inputs = tokenizer(text, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=300)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
What is a recipe for eggs?


  "ingredients": [
    "2 eggs",
    "Salt to taste",
    "Pepper to taste",
    "Butter for frying"


To craft a dish of eggs,
Gather the ingredients, with care,
Two eggs, salt, pepper, and butter,
And a pan for frying, hot as a furnace.

Crack the eggs into a bowl,
And whisk them with a swift hand,
Add the salt and pepper,
To give them flavor, bold.

Heat the pan with the butter,
Until it sizzles and shines,
Pour in the eggs, and let them cook,
Until they are golden and bright.

Flip them over, with a gentle touch,
And cook the other side,
Until they are done to perfection,
A feast fit for a king.

Serve the eggs, hot and steaming,
With a slice of bread,
And enjoy the deliciousness,
Of this humble, yet divine, dish.

May your eggs be cooked with love,
And bring joy to your table,
For in their simplicity,
Lies the flavor of home.