Fine-tune Gemma 4 E4B-it on NCERT Class 11-12 Science content to make it an expert tutor that:
- Explains concepts step-by-step from the textbook
- Answers in Gujarati/Hindi/English
- References specific NCERT chapters and page numbers
- Generates NCERT-aligned quiz questions
| Item | Status |
|---|---|
| Gemma 4 E4B-it model (HF format) | ✅ In models/ folder |
| NCERT PDFs (20 books, EN + GJ) | ✅ In Data/NCERT Books/ |
| JEE Papers (30 PDFs) | ✅ In Data/JEE Previous Year Papers/ |
| NEET Papers (6 PDFs) | ✅ In Data/NEET Previous Year Papers/ |
| Unsloth Studio | ✅ Installed locally |
| Kaggle/Colab GPU | Required for training |
Run this Python script to extract content from your PDFs:
# File: Data/extract_ncert.py
import os
import json
# pip install pymupdf
import fitz # PyMuPDF
def extract_pdf_text(pdf_path):
"""Extract text from a PDF file, page by page."""
doc = fitz.open(pdf_path)
pages = []
for page_num in range(len(doc)):
page = doc[page_num]
text = page.get_text("text").strip()
if text and len(text) > 50: # Skip mostly-empty pages
pages.append({
"page_number": page_num + 1,
"text": text,
})
doc.close()
return pages
def main():
books_dir = "NCERT Books"
output = []
for filename in sorted(os.listdir(books_dir)):
if not filename.endswith(".pdf"):
continue
filepath = os.path.join(books_dir, filename)
print(f"Extracting: {filename}")
# Parse metadata from filename
parts = filename.replace(".pdf", "").split(" - ")
language = parts[-1] if len(parts) > 1 else "Eng"
pages = extract_pdf_text(filepath)
for page in pages:
output.append({
"source": filename,
"language": language,
"page": page["page_number"],
"content": page["text"],
})
with open("ncert_extracted.json", "w", encoding="utf-8") as f:
json.dump(output, f, ensure_ascii=False, indent=2)
print(f"\n✅ Extracted {len(output)} pages from {len(os.listdir(books_dir))} PDFs")
if __name__ == "__main__":
main()Convert extracted text into the Gemma 4 chat format:
# File: Data/create_dataset.py
import json
import random
def load_extracted():
with open("ncert_extracted.json", "r", encoding="utf-8") as f:
return json.load(f)
def create_training_examples(pages):
"""Create Q&A training examples from NCERT content."""
examples = []
for page in pages:
content = page["content"]
source = page["source"]
page_num = page["page"]
lang = page["language"]
if len(content) < 200:
continue
# Template 1: Explain a concept
examples.append({
"messages": [
{
"role": "system",
"content": "You are Aarohan AI, an expert NCERT tutor for Class 11-12 Science students in India. "
"Always explain step-by-step, reference the textbook, and be encouraging."
},
{
"role": "user",
"content": f"Explain the concept from {source}, page {page_num}"
},
{
"role": "model", # NOTE: Gemma uses "model", not "assistant"
"content": f"**From {source}, Page {page_num}:**\n\n{content[:1500]}\n\n"
f"**Key Takeaway:** Study this section carefully — it's important for your exams! 📚"
}
]
})
# Template 2: Summarize for quick revision
if len(content) > 500:
summary = content[:800]
examples.append({
"messages": [
{
"role": "system",
"content": "You are Aarohan AI. Provide concise summaries for exam revision."
},
{
"role": "user",
"content": f"Give me a quick summary of the content on page {page_num} of {source}"
},
{
"role": "model",
"content": f"**Quick Revision — {source}, Page {page_num}:**\n\n{summary}\n\n"
f"📝 **Remember:** Focus on understanding the concepts, not memorizing!"
}
]
})
return examples
def main():
pages = load_extracted()
examples = create_training_examples(pages)
# Shuffle and split
random.shuffle(examples)
split = int(len(examples) * 0.9)
train = examples[:split]
val = examples[split:]
with open("train_dataset.json", "w", encoding="utf-8") as f:
json.dump(train, f, ensure_ascii=False, indent=2)
with open("val_dataset.json", "w", encoding="utf-8") as f:
json.dump(val, f, ensure_ascii=False, indent=2)
print(f"✅ Created {len(train)} training + {len(val)} validation examples")
if __name__ == "__main__":
main()Important
The dataset format uses "role": "model" (NOT "role": "assistant"). This is specific to Gemma.
-
Start Unsloth Studio:
unsloth studio -H 0.0.0.0 -p 8888
-
Open
http://localhost:8888in your browser -
In the Studio UI:
- Model: Select
unsloth/gemma-4-e4b-it(or point to your localmodels/gemma-4-transformers-gemma-4-e4b-it-v1/) - Dataset: Upload
train_dataset.json - LoRA Rank: 16 (good balance of quality vs speed)
- Learning Rate: 2e-4
- Epochs: 3
- Max Seq Length: 2048
- Click Start Training
- Model: Select
Tip
Kaggle gives you 30 hours/week of free GPU. Use this for training! If available, select 2x T4 (total ~30GB VRAM) instead of single T4 for significantly faster training.
If Kaggle gives you a P100 and you see the sm_60 / PyTorch incompatibility warning, install a CUDA 11.8 build of PyTorch before importing Unsloth. That wheel still supports the P100, while the newer CUDA 12.8 wheel does not.
# Cell 0: Fix P100-compatible PyTorch if needed
import torch
print(torch.__version__)
# If you see a CUDA 12.8 wheel or the sm_60 warning, run this cell first.
!pip uninstall -y torch torchvision torchaudio triton xformers
!pip install --no-cache-dir --index-url https://download.pytorch.org/whl/cu118 \
torch torchvision torchaudio
!pip install --no-cache-dir unsloth datasets trl accelerate peft bitsandbytesCreate a new Kaggle notebook and run:
# Cell 1: Load Model (Multi-GPU Support)
from unsloth import FastLanguageModel
import torch
# For dual T4 GPUs, use device_map="auto" to distribute across both GPUs
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/gemma-4-e4b-it", # Auto-downloads from HF
max_seq_length=2048,
dtype=None, # Auto-detect (float16 on T4)
load_in_4bit=True, # QLoRA — 4-bit quantization
device_map="auto", # CRITICAL: Distribute model across available GPUs
)
print(f"✅ Model loaded!")
print(f"🖥️ Using {torch.cuda.device_count()} GPUs:")
for i in range(torch.cuda.device_count()):
print(f" GPU {i}: {torch.cuda.get_device_name(i)} ({torch.cuda.get_device_properties(i).total_memory / 1e9:.2f} GB)")
# Cell 2: Add LoRA Adapters
model = FastLanguageModel.get_peft_model(
model,
r=16, # LoRA rank
target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
],
lora_alpha=32, # 2x rank is standard
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth", # 60% less VRAM!
)
print(f"✅ LoRA adapters added. Trainable params: {model.print_trainable_parameters()}")
# Cell 3: Load YOUR Dataset
from datasets import load_dataset
# Upload train_dataset.json to Kaggle first!
dataset = load_dataset("json", data_files="/kaggle/input/datasets/mananmonani/gemma-4-fine-tunning-curated-dataset/train_dataset.json", split="train")
print(f"✅ Dataset loaded: {len(dataset)} examples")
# Cell 4: Format for Training
def format_chat(example):
"""Apply Gemma 4 chat template to each example."""
text = tokenizer.apply_chat_template(
example["messages"],
tokenize=False,
add_generation_prompt=False,
)
return {"text": text}
dataset = dataset.map(format_chat)
# Cell 5: Training (Optimized for Dual T4 GPUs ~30GB VRAM)
from trl import SFTTrainer
from transformers import TrainingArguments
import torch
# With 2x T4 (30GB total), we can use much larger batch and sequence length
# Use ~80% of dataset for training, rest for validation
train_size = int(len(dataset) * 0.8)
train_data = dataset.select(range(train_size))
val_data = dataset.select(range(train_size, len(dataset)))
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=train_data,
eval_dataset=val_data, # Validation set
dataset_text_field="text",
max_seq_length=2048, # Full sequence length (30GB allows this)
args=TrainingArguments(
output_dir="outputs",
per_device_train_batch_size=4, # Increased from 1 → 4 (dual T4s can handle it)
per_device_eval_batch_size=4,
gradient_accumulation_steps=2, # Effective batch = 8
warmup_steps=20,
num_train_epochs=3, # Full epochs
learning_rate=2e-4,
fp16=not torch.cuda.is_bf16_supported(),
bf16=torch.cuda.is_bf16_supported(),
logging_steps=50,
eval_steps=200, # Evaluate every 200 steps
save_strategy="steps",
save_steps=200,
optim="adamw_8bit",
seed=42,
dataloader_num_workers=4, # Faster data loading
dataloader_pin_memory=True,
),
)
print(f"🚀 Training on {len(train_data)} examples with validation on {len(val_data)} examples...")
print(f"🖥️ Using {torch.cuda.device_count()} GPUs with {torch.cuda.get_device_name(0)}")
trainer.train()
print("✅ Training complete!")
# Cell 6: Save & Export
# Save LoRA adapter
model.save_pretrained("aarohan-ai-lora")
tokenizer.save_pretrained("aarohan-ai-lora")
# Merge and save full model (for Ollama)
model.save_pretrained_merged(
"aarohan-ai-merged",
tokenizer,
save_method="merged_16bit", # Full precision merged model
)
print("✅ Model saved!")
# Cell 8: Export to GGUF (for Ollama)
model.save_pretrained_gguf(
"aarohan-ai-gguf",
tokenizer,
quantization_method="q4_k_m", # Good quality-to-size ratio
)
print("✅ GGUF exported! Download aarohan-ai-gguf/ for Ollama")If you do not want to downgrade PyTorch, the other fix is to switch the Kaggle accelerator from P100 to T4. That avoids the sm_60 compatibility issue entirely.
Same code as Kaggle, but:
- Upload
train_dataset.jsonto your Google Drive - Mount Drive:
from google.colab import drive; drive.mount('/content/drive') - Change dataset path to
/content/drive/MyDrive/train_dataset.json
# Create Modelfile
cat > Modelfile <<EOF
FROM ./aarohan-ai-gguf/unsloth.Q4_K_M.gguf
TEMPLATE """{{ if .System }}<start_of_turn>user
{{ .System }}
<end_of_turn>
{{ end }}<start_of_turn>user
{{ .Prompt }}
<end_of_turn>
<start_of_turn>model
{{ .Response }}<end_of_turn>"""
PARAMETER temperature 0.7
PARAMETER top_p 0.95
PARAMETER stop "<end_of_turn>"
SYSTEM "You are Aarohan AI, an expert NCERT tutor for Indian students."
EOF
# Import into Ollama
ollama create aarohan-ai -f Modelfile
# Test
ollama run aarohan-ai "Explain Newton's second law"Convert to LiteRT-LM format using the ai-edge-torch tool:
pip install ai-edge-torch
python -c "
import ai_edge_torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('aarohan-ai-merged')
tokenizer = AutoTokenizer.from_pretrained('aarohan-ai-merged')
# Convert to LiteRT-LM format
ai_edge_torch.convert(model, tokenizer, output_path='aarohan-ai.task')
print('✅ LiteRT-LM model exported!')
"Then install in the Flutter app:
await FlutterGemma.installModel(
modelType: ModelType.gemma4,
).fromFile('/path/to/aarohan-ai.task').install();| Error | Fix |
|---|---|
OutOfMemoryError during training |
Reduce max_seq_length to 512–1024, reduce gradient_accumulation_steps to 1–2, or split dataset into smaller batches (start with 1000 examples for testing) |
ValueError: Some modules are dispatched on the CPU or the disk |
Use device_map="auto" in from_pretrained() to properly distribute across all GPUs. Ensure you're using dual T4 on Kaggle. |
use_cache=True gibberish output |
Unsloth handles this automatically. Don't set use_cache manually |
tokenizer.apply_chat_template fails |
Ensure your messages use "role": "model" not "role": "assistant" |
| VRAM insufficient for E4B | Use load_in_4bit=True (QLoRA) — needs ~10GB VRAM. Further reduce max_seq_length to 512 if still OOM. |
sm_60 / P100 compatibility warning |
Use the CUDA 11.8 PyTorch install cell above, or switch Kaggle to a T4 GPU |
| Kaggle session timeout | Save checkpoints each epoch, resume from last checkpoint |
| Metric | Before Fine-Tuning | After Fine-Tuning |
|---|---|---|
| NCERT Accuracy | ~60-70% | ~85-95% |
| Gujarati Quality | Generic | NCERT-specific terminology |
| Exam Relevance | Low | High (JEE/NEET aligned) |
| Response Style | Generic AI | Step-by-step tutor format |
- v1 — Extract → Train (500 examples) → Test manually → Fix issues
- v2 — Add JEE/NEET questions → Train (1000+ examples) → Benchmark
- v3 — Add Gujarati examples → Train with bilingual data → Ship
Tip
Start with a small dataset (500 examples) to validate the pipeline works, then scale up.