- Platform: YouTube
- Channel/Creator: Tech With Tim
- Duration: 00:22:03
- Release Date: Jun 27, 2025
- Video Link: https://www.youtube.com/watch?v=pTaSDVz0gok
Disclaimer: This is a personal summary and interpretation based on a YouTube video. It is not official material and not endorsed by the original creator. All rights remain with the respective creators.
This document summarizes the key takeaways from the video. I highly recommend watching the full video for visual context and coding demonstrations.
- I summarize key points to help you learn and review quickly.
- Simply click on
Ask AIlinks to dive into any topic you want.
Teach Me: 5 Years Old | Beginner | Intermediate | Advanced | (reset auto redirect)
Learn Differently: Analogy | Storytelling | Cheatsheet | Mindmap | Flashcards | Practical Projects | Code Examples | Common Mistakes
Check Understanding: Generate Quiz | Interview Me | Refactor Challenge | Assessment Rubric | Next Steps
- Summary: Fine-tuning takes a pre-trained LLM like GPT or Claude and adapts it to excel at a specific task using your examples, such as customer service or legal docs. It's different from parameter tuning, which just tweaks settings like temperature. Use it for consistent output formats, domain-specific data the model hasn't seen, or to cut costs with smaller models. It requires less data and compute than training from scratch but can make the model worse at general tasks.
- Key Takeaway/Example: Think of it like training an experienced chef on your recipes—efficient but specialized. For instance, fine-tune for JSON outputs or handling medical records.
- Link for More Details: Ask AI: Fine-Tuning LLMs
- Summary: The dataset is crucial—bad data leads to poor results. Collect examples of inputs and desired outputs, like prompts and responses. The example uses AI-generated data for HTML extraction, with 500 JSON entries containing sample HTML inputs and formatted outputs (e.g., name, price, category, manufacturer).
- Key Takeaway/Example: Format as a list of dictionaries with "input" (prompt) and "output" (expected response as string). You can use real data like customer logs.
{
"input": "<div class='product'><h2>Product Name</h2><p class='price'>$99.99</p></div>",
"output": "{\"name\": \"Product Name\", \"price\": \"$99.99\", \"category\": \"Electronics\", \"manufacturer\": \"BrandX\"}"
}- Link for More Details: Ask AI: Gathering Datasets for Fine-Tuning
- Summary: Unsloth is a free, open-source tool that's fast for fine-tuning LLMs. Run it in Python, ideally in Google Colab for free high-end GPUs like T4, or locally with a powerful NVIDIA GPU and CUDA.
- Key Takeaway/Example: Download the provided notebook, upload your dataset JSON, and follow the cells. It's quicker in Colab than locally unless you have a 4080/4090 GPU.
- Link for More Details: Ask AI: Unsloth for LLM Fine-Tuning
- Summary: In Colab, install Unsloth and dependencies with pip, then restart the runtime. Check for CUDA and GPU availability to ensure fast training.
- Key Takeaway/Example: Run commands like
!pip install unslothand verify withtorch.cuda.is_available()—expect True and a GPU like Tesla T4 in Colab.
import torch
print(torch.cuda.is_available())
print(torch.cuda.get_device_name())- Link for More Details: Ask AI: Installing Dependencies for Fine-Tuning
- Summary: Choose an open-source model like Phi-3 Mini (small and fast) or others like Llama 3.1. Load it with Unsloth's FastLanguageModel, setting max sequence length and 4-bit quantization.
- Key Takeaway/Example: Use
model, tokenizer = FastLanguageModel.from_pretrained(model_name, max_seq_length=2048, load_in_4bit=True). Larger models take longer to load and train.
from unsloth import FastLanguageModel
model_name = "unsloth/Phi-3-mini-4k-instruct"
max_seq_length = 2048
model, tokenizer = FastLanguageModel.from_pretrained(model_name, max_seq_length=max_seq_length, load_in_4bit=True)- Link for More Details: Ask AI: Loading Base Models in Unsloth
- Summary: Format your data into a single string per example with input, output, and an end-of-text token. Convert to a Hugging Face dataset for the trainer.
- Key Takeaway/Example: Define a prompt function to combine input and JSON-stringified output, then map over your data.
def format_prompt(example):
return f"{example['input']}\n{json.dumps(example['output'])}\n<|endoftext|>"
formatted_data = [format_prompt(ex) for ex in data]
dataset = Dataset.from_dict({"text": formatted_data})- Link for More Details: Ask AI: Preprocessing Data for Fine-Tuning
- Summary: Add LoRA (Low-Rank Adaptation) layers to the model for efficient fine-tuning without changing the whole model.
- Key Takeaway/Example: Use Unsloth's get_peft_model with parameters like rank (r=16) and target modules for layers like q_proj, v_proj.
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
random_state=3407,
use_rslora=False,
loftq_config=None
)- Link for More Details: Ask AI: LoRA Adapters in Fine-Tuning
- Summary: Set up the SFTTrainer with your model, tokenizer, dataset, and training args like batch size and epochs. Run trainer.train()—time varies by dataset size and model (e.g., 10 minutes for small setup).
- Key Takeaway/Example: More examples and epochs improve results, but start small to test.
from trl import SFTTrainer
from transformers import TrainingArguments
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=max_seq_length,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=5,
num_train_epochs=1,
learning_rate=2e-4,
fp16=not torch.cuda.bf16_supported(),
bf16=torch.cuda.bf16_supported(),
logging_steps=1,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
output_dir="outputs"
)
)
trainer.train()- Link for More Details: Ask AI: Training LLMs with Unsloth
- Summary: After training, set the model for inference and test with sample prompts to verify outputs match expectations.
- Key Takeaway/Example: Use messages in chat format; adjust for your data. Outputs may vary slightly due to small datasets.
messages = [{"role": "user", "content": "Extract info from this HTML: <div class='product'>..."}]
inputs = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(inputs, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=200, use_cache=True)
print(tokenizer.batch_decode(outputs))- Link for More Details: Ask AI: Testing Fine-Tuned Models
- Summary: Save the model in GGUF format for Ollama compatibility, then download from Colab (can take 10-25 minutes).
- Key Takeaway/Example: Use
model.save_pretrained_gguf("unsloth.Q4_K_M.gguf", tokenizer, quantization_method="q4_k_m"). Download via Colab's files.download(). - Link for More Details: Ask AI: Saving Models for Ollama
- Summary: Create a Modelfile pointing to your GGUF file, set parameters like temperature, and define a template. Use
ollama createto add it, thenollama runto interact locally. - Key Takeaway/Example: Modelfile example:
FROM ./unsloth.Q4_K_M.gguf
PARAMETER top_p 0.7
PARAMETER temperature 0.7
PARAMETER stop "User:"
PARAMETER stop "<|endoftext|>"
TEMPLATE "{{ .Prompt }}"
SYSTEM "You are a helpful AI assistant."
Run ollama create html-model -f Modelfile and test prompts in the Ollama CLI.
- Link for More Details: Ask AI: Running Fine-Tuned Models in Ollama
About the summarizer
I'm Ali Sol, a Backend Developer. Learn more:
- Website: alisol.ir
- LinkedIn: linkedin.com/in/alisolphp