LLM Fine-tuner: Production-Ready LoRA Fine-tuning Pipeline

Production-grade pipeline for fine-tuning Large Language Models using LoRA/QLoRA with automated deployment, evaluation, and experiment tracking.

Overview

This project implements an end-to-end ML pipeline for fine-tuning open-source LLMs (Llama, Mistral) using parameter-efficient fine-tuning (PEFT) techniques. The pipeline includes:

Efficient Training: QLoRA (4-bit quantization) for training on consumer GPUs
Experiment Tracking: MLflow and Weights & Biases integration
Comprehensive Evaluation: Task-specific metrics and benchmarking
Production Serving: vLLM-powered high-performance inference API
Cost Analysis: Detailed ROI calculations vs. commercial APIs

Key Features

Parameter-Efficient Fine-tuning with LoRA/QLoRA
4-bit Quantization using bitsandbytes for memory efficiency
Automated Experiment Tracking with MLflow
Custom Evaluation Metrics for task-specific performance
Fast Inference Serving with vLLM (50+ tokens/sec)
Docker Support for reproducible training and deployment
Cloud Deployment Ready (AWS, GCP compatible)

Architecture

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Data Pipeline  │────▶│  LoRA Training   │────▶│   Evaluation    │────▶│   Deployment    │
│  HuggingFace   │     │  QLoRA/PEFT      │     │  Custom Metrics │     │   vLLM Server   │
└─────────────────┘     └──────────────────┘     └─────────────────┘     └─────────────────┘
        │                       │                         │                       │
        │                       │                         │                       │
        ▼                       ▼                         ▼                       ▼
   Validation            MLflow/W&B              Benchmarking              FastAPI/REST
   Augmentation          Experiment              Comparison                Load Balancing
                         Tracking                Cost Analysis

Quick Start

Prerequisites

Python 3.10+
CUDA-capable GPU (8GB+ VRAM for training, 16GB+ recommended)
HuggingFace account (for model access)

Installation

# Clone the repository
git clone https://github.com/ashwani65/llm-finetuner.git
cd llm-finetuner

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys and configuration

Training a Model

# Prepare your dataset
python scripts/prepare_data.py \
    --input data/raw/dataset.json \
    --output data/processed/

# Start training
python scripts/train.py \
    --config configs/llama_sql_config.yaml \
    --data data/processed/train.json \
    --output models/llama-sql-v1

# Monitor training
mlflow ui  # Visit http://localhost:5000

Evaluation

python scripts/evaluate.py \
    --model models/llama-sql-v1 \
    --test_data data/processed/test.json \
    --output evaluation_results.json

Deployment

# Start vLLM inference server
python -m src.serving.vllm_server \
    --model models/llama-sql-v1 \
    --port 8000

# Test the API
curl -X POST http://localhost:8000/generate \
    -H "Content-Type: application/json" \
    -d '{
        "prompt": "Generate SQL: Find all users who signed up in 2024",
        "max_tokens": 128,
        "temperature": 0.1
    }'

Web Interface

Start the complete stack with frontend and backend:

# Terminal 1: Start Production API
./start_api.sh
# API: http://localhost:8000
# Docs: http://localhost:8000/docs

# Terminal 2: Start Frontend
cd frontend
npm run dev
# UI: http://localhost:3000

Web Interface Features:

📊 Real-time training monitoring
📁 Dataset upload and management
⚙️ Interactive training configuration
📈 Evaluation results visualization
🚀 Model deployment controls
💻 GPU monitoring

Project Structure

llm-finetuner/
├── src/
│   ├── data/              # Data preprocessing & validation
│   │   ├── dataset_builder.py
│   │   ├── preprocessing.py
│   │   └── validation.py
│   ├── training/          # Training pipeline
│   │   ├── trainer.py
│   │   ├── config.py
│   │   └── callbacks.py
│   ├── evaluation/        # Metrics & benchmarking
│   │   ├── metrics.py
│   │   ├── benchmarks.py
│   │   └── comparison.py
│   ├── serving/           # Inference & API server
│   │   ├── vllm_server.py
│   │   ├── production_api.py  # Production FastAPI backend
│   │   └── api.py
│   ├── monitoring/        # Experiment tracking
│   │   ├── mlflow_tracking.py
│   │   └── wandb_logging.py
│   └── utils/             # Utilities
│       ├── gpu_utils.py
│       ├── model_utils.py
│       ├── cost_calculator.py
│       ├── database.py        # SQLite ORM models
│       └── job_manager.py     # Async training jobs
├── configs/               # Configuration files
│   ├── base_config.yaml
│   ├── llama_sql_config.yaml
│   └── mistral_config.yaml
├── scripts/               # Executable scripts
│   ├── prepare_data.py
│   ├── train.py
│   ├── evaluate.py
│   └── deploy.py
├── notebooks/             # Jupyter notebooks
│   ├── 01_data_exploration.ipynb
│   ├── 02_training_experiments.ipynb
│   └── 03_evaluation_analysis.ipynb
├── frontend/              # React web interface
│   ├── src/
│   │   ├── components/    # UI components
│   │   ├── services/      # API integration
│   │   └── App.jsx
│   └── package.json
├── tests/                 # Unit tests
├── docker/                # Docker configurations
│   ├── Dockerfile.training
│   ├── Dockerfile.serving
│   └── docker-compose.yml
├── start_api.sh           # API startup script
└── requirements.txt

Use Cases

This pipeline supports various fine-tuning tasks:

1. SQL Generation

Fine-tune models to generate SQL queries from natural language.

Input: "Find total revenue by product category in 2024"
Output: "SELECT category, SUM(revenue) FROM sales WHERE year = 2024 GROUP BY category"

2. Code Review

Train models to provide code review comments and suggestions.

3. Domain Q&A

Create specialized chatbots for specific domains (legal, medical, financial).

Training Configuration

LoRA Parameters

lora:
  r: 16                    # LoRA rank
  lora_alpha: 32          # LoRA scaling
  lora_dropout: 0.05
  target_modules:         # Attention modules to apply LoRA
    - "q_proj"
    - "v_proj"
    - "k_proj"
    - "o_proj"

Quantization (QLoRA)

quantization:
  load_in_4bit: true
  bnb_4bit_compute_dtype: "float16"
  bnb_4bit_quant_type: "nf4"
  bnb_4bit_use_double_quant: true

Training Hyperparameters

training:
  num_train_epochs: 3
  batch_size: 4
  gradient_accumulation_steps: 4
  learning_rate: 2.0e-4
  warmup_steps: 100
  weight_decay: 0.01

Performance Benchmarks

Training Efficiency

Model	GPU	VRAM Usage	Training Time	Cost
Llama 2 7B (QLoRA)	A100 40GB	~12GB	8 hours	$20
Mistral 7B (QLoRA)	A100 40GB	~11GB	6 hours	$15
Llama 2 7B (Full)	A100 80GB	~45GB	24 hours	$60

Inference Speed

Serving Method	Tokens/sec	Latency (p50)	Throughput
vLLM	55	120ms	100 req/sec
HF Transformers	12	850ms	20 req/sec

Task Performance (SQL Generation)

Model	Exact Match	BLEU	Cost/1K Queries
Fine-tuned Llama 2 7B	0.73	0.68	$0.05
GPT-4 Turbo	0.89	0.82	$10.00
GPT-3.5 Turbo	0.65	0.61	$0.50

Cost Analysis

Training Costs

GPU: A100 40GB @ $2.50/hour × 8 hours = $20
Storage: S3/GCS = $2/month
Total Training: ~$22

Inference Costs (10K queries/month)

Self-hosted (vLLM on T4): $0.35/hour × 24 × 30 = $252/month
GPT-4 API: $10/1M tokens × 5M tokens = $50,000/month
Savings: 99.5%

Break-even point: ~280 queries

Experiment Tracking

MLflow Integration

# Automatic tracking in training
mlflow.log_params({
    "model_name": "llama-2-7b",
    "lora_r": 16,
    "learning_rate": 2e-4
})
mlflow.log_metrics({
    "train_loss": 0.234,
    "eval_loss": 0.198
})

Weights & Biases

# Enable W&B in config
tracking:
  use_wandb: true
  wandb_project: "llm-finetuner"

Docker Deployment

Build Training Container

docker build -f docker/Dockerfile.training -t llm-finetuner-train .

docker run --gpus all \
    -v $(pwd)/data:/app/data \
    -v $(pwd)/models:/app/models \
    llm-finetuner-train \
    python scripts/train.py --config configs/llama_sql_config.yaml

Build Serving Container

docker build -f docker/Dockerfile.serving -t llm-finetuner-serve .

docker run --gpus all -p 8000:8000 \
    -v $(pwd)/models:/app/models \
    llm-finetuner-serve

Docker Compose

docker-compose up -d

Advanced Features

Custom Evaluation Metrics

from src.evaluation.metrics import SQLEvaluator

evaluator = SQLEvaluator()
results = evaluator.evaluate(predictions, references)
# Returns: exact_match, component_match, BLEU, ROUGE

Model Comparison

from src.evaluation.comparison import ModelComparison

comparison = ModelComparison(evaluator)
comparison.add_model("fine-tuned", predictions1, references)
comparison.add_model("gpt-4", predictions2, references)
comparison.plot_comparison(save_path="comparison.png")

Cost Calculator

from src.utils.cost_calculator import CostCalculator

calculator = CostCalculator()
results = calculator.compare_total_cost(
    training_cost=20,
    inference_queries=10000,
    gpu_type="A100-40GB"
)
print(f"Self-hosted: ${results['self_hosted']['total']:.2f}")
print(f"API cost: ${results['api']['total']:.2f}")
print(f"Savings: ${results['savings']:.2f}")

Roadmap

Support for Llama 3.2 and Mistral 0.3
Multi-GPU distributed training
Automatic hyperparameter tuning
Integration with LangChain/LlamaIndex
Support for RLHF (Reinforcement Learning from Human Feedback)
Kubernetes deployment templates
Prompt engineering utilities
Dataset versioning with DVC

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Tech Stack

Core ML: PyTorch, Transformers, PEFT, bitsandbytes, Accelerate, TRL

Serving: vLLM, FastAPI, Uvicorn

Experiment Tracking: MLflow, Weights & Biases

Data: Datasets, Pandas, NumPy, scikit-learn

Evaluation: BLEU, ROUGE, Custom Metrics

Deployment: Docker, Kubernetes (optional)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this project in your research or production, please cite:

@misc{llm-finetuner,
  author = {Ashwani Singh},
  title = {LLM Fine-tuner: Production-Ready LoRA Fine-tuning Pipeline},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/ashwani65/llm-finetuner}
}

Acknowledgments

HuggingFace for the Transformers and PEFT libraries
vLLM team for the high-performance inference engine
Meta AI for Llama 2
Mistral AI for Mistral 7B

Contact

Ashwani Singh

GitHub: @ashwani65
LinkedIn: Ashwani Singh
Email: [email protected]

Built with ❤️ for the ML community

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
docker		docker
frontend		frontend
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AWS_DEPLOYMENT_GUIDE.md		AWS_DEPLOYMENT_GUIDE.md
LICENSE		LICENSE
README.md		README.md
SETUP.md		SETUP.md
requirements.txt		requirements.txt
start_api.sh		start_api.sh
test_pipeline.py		test_pipeline.py

License

ashwani65/llm-finetuner

Folders and files

Latest commit

History

Repository files navigation

LLM Fine-tuner: Production-Ready LoRA Fine-tuning Pipeline

Overview

Key Features

Architecture

Quick Start

Prerequisites

Installation

Training a Model

Evaluation

Deployment

Web Interface

Project Structure

Use Cases

1. SQL Generation

2. Code Review

3. Domain Q&A

Training Configuration

LoRA Parameters

Quantization (QLoRA)

Training Hyperparameters

Performance Benchmarks

Training Efficiency

Inference Speed

Task Performance (SQL Generation)

Cost Analysis

Training Costs

Inference Costs (10K queries/month)

Experiment Tracking

MLflow Integration

Weights & Biases

Docker Deployment

Build Training Container

Build Serving Container

Docker Compose

Advanced Features

Custom Evaluation Metrics

Model Comparison

Cost Calculator

Roadmap

Contributing

Tech Stack

License

Citation

Acknowledgments

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages