Skip to content

bas3line/lora-train-pipeline

LLM Fine-Tuning Pipeline

CI

Scalable LoRA fine-tuning for chat models using Modal's serverless GPU infrastructure. Train on your own data with MongoDB + R2 storage integration.

Features

  • LoRA adapter training (0.2% parameters, 8MB checkpoints)
  • Modal.com serverless GPU orchestration
  • MongoDB data source with flexible filtering
  • Automatic R2 backup and versioning
  • Configurable hyperparameters and runtime options

Quick Start

# Setup Modal
uv run modal setup

# Configure secrets in Modal dashboard
# Required: mongo-credentials, r2-credentials

# Train model
uv run modal run main.py::train_model --offset 0

Configuration

Edit app/config.py to adjust:

  • DataConfig.chunk_size - Sample limit per training run
  • ModelConfig - LoRA rank, learning rate, batch size
  • RuntimeConfig - GPU type, timeout, Modal secrets

Commands

# Training and model management
uv run modal run main.py::train_model --offset 0 --chunk_size 1000
uv run modal run main.py::check_mongodb
uv run modal run main.py::download_model
uv run modal run main.py::upload_model --offset 0

Development

# Install dev dependencies
uv sync --extra dev

# Format code
uv run ruff format .

# Check linting
uv run ruff check .

# Auto-fix linting issues
uv run ruff check . --fix

# Type checking
uv run mypy app/ --ignore-missing-imports

License

MIT

Contact

Kira
📧 hi@ykira.com
🌐 ykira.com

About

LoRA fine-tuning

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages