This project provides a complete workflow for fine-tuning Stable Diffusion 3.5 Medium using LoRA (Low-Rank Adaptation). Train on custom datasets with per-image prompts and generate high-quality images on consumer GPUs.
- Stable Diffusion 3.5 Medium - Latest model with excellent quality
- LoRA Fine-tuning - Efficient training without modifying base model
- Per-image Prompts - Custom prompts loaded from sidecar
.txtfiles - Comparison Tools - Side-by-side base vs LoRA model evaluation
- Web Scraping - Automated dataset collection with prompt file generation
- Python 3.9+
- NVIDIA GPU with 16GB+ VRAM (RTX 4080/4090, A6000, etc.)
- CUDA 12.2+ installed
- pipenv for dependency management
First, set up the Python environment and install the required packages.
# Install dependencies
pipenv install
Use webscraper.py to download images and create empty prompt files:
pipenv run python webscraper.py- Enter search query and number of images
- Images saved to
training-images/with matching.txtfiles - Curate your dataset: Remove low-quality/irrelevant images
Edit the .txt files to add descriptive prompts for each image:
training-images/
├── image1.jpg
├── image1.txt ← "A woman with brown hair smiling"
├── image2.jpg
├── image2.txt ← "A woman in a red dress outdoors"
└── ...
Train your LoRA adapter with finetune.py:
pipenv run python finetune.pyTraining will take some time, depending on your GPU and the number of images. The script will print the loss at each step. Once complete, the trained LoRA weights will be saved in the lora_weights/ directory.
Generate images with your fine-tuned model:
pipenv run python inference.pyCompare base model vs LoRA model side-by-side:
pipenv run python compare.pyGenerates 4 comparison images:
base_baseline.png- Base model + simple promptbase_test.png- Base model + your promptlora_baseline.png- LoRA model + simple promptlora_test.png- LoRA model + your prompt
- Quality over Quantity - 20-50 high-quality images work better than hundreds of poor ones
- Diverse Prompts - Use varied, descriptive prompts for each image
- Consistent Style - Keep similar lighting/composition for style training
- Monitor Loss - Use training loss and comparison script to refine fine-tuning
Feel free to submit issues and pull requests to improve the project!
This project is open source. Please respect the licenses of the underlying models and libraries.