Image Relighting with Diffusion Models

A complete end-to-end pipeline for training image relighting models. This project enables you to:

Filter high-quality images with good lighting from large datasets
Generate albedo/degraded images (training pairs)
Caption images with lighting keywords using VLM
Train instruction-based image editing models (InstructPix2Pix)

🎯 Pipeline Overview

┌──────────────────────────────────────────────────────────────────────────────────────┐
│                           IMAGE RELIGHTING PIPELINE                                  │
├──────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                      │
│  STEP 1              STEP 2                   STEP 3              STEP 4             │
│  ──────              ──────                   ──────              ──────             │
│                                                                                      │
│  filter_images/  →   albedo/                →  edit_keywords/  →   training/         │
│                      relightingDataGen-                                              │
│                      parallel                                                        │
│                                                                                      │
│  ┌────────────┐      ┌──────────────────┐     ┌──────────────┐    ┌─────────────┐   │
│  │ FFHQ 70k   │      │ Filtered Images  │     │ CSV +        │    │ Triplet     │   │
│  │ Images     │  →   │ + Degraded       │  →  │ Keywords     │ →  │ Training    │   │
│  └────────────┘      │ Outputs          │     └──────────────┘    └─────────────┘   │
│        │             └──────────────────┘            │                   │          │
│        ▼                     │                       ▼                   ▼          │
│  ┌────────────┐              ▼               ┌──────────────┐    ┌─────────────┐   │
│  │ CLIP       │      ┌──────────────────┐    │ VLM          │    │ Train       │   │
│  │ Lighting   │      │ • SAM3 Segment   │    │ (Qwen3-VL    │    │ SD1.5/SDXL  │   │
│  │ Filter     │      │ • Albedo Extract │    │  default)    │    │ Model       │   │
│  └────────────┘      │ • Degradation    │    └──────────────┘    └─────────────┘   │
│                      └──────────────────┘                                           │
│                                                                                      │
└──────────────────────────────────────────────────────────────────────────────────────┘

TRAINING DATA MAPPING:
┌─────────────────────────────────────────────────────────────────────────────────────┐
│  Training Input  = Degraded Image (flat lighting from Step 2)                       │
│  Instruction     = Lighting Keywords (from Step 3: "sunlight through blinds")       │
│  Training Output = Original Image (real lighting)                                   │
└─────────────────────────────────────────────────────────────────────────────────────┘

📁 Project Structure

image-relighting-diffusion/
├── filter_images/                    # Step 1: Image filtering (CLIP-based)
│   ├── filter_lighting_images.py     
│   ├── verify_filtering.py           
│   └── analyze_results.py            
│
├── albedo/                           # Step 2: Training data generation
│   └── relightingDataGen-parallel/   
│       ├── scripts/
│       │   └── run_multi_gpu_batched.py
│       ├── albedo_csv_files/         # Output CSVs saved here
│       └── src/
│           └── stages/               # SAM3, Albedo, Shadow stages
│
├── edit_keywords/                    # Step 3: Lighting keywords generation
│   ├── generate_keywords.py          # VLM-based keyword generation
│   ├── prepare_training_data.py      # Convert to training format
│   └── README.md
│
└── training/                         # Step 4: Model training
    ├── sd1_5/                        # Stable Diffusion 1.5
    ├── sdxl/                         # Stable Diffusion XL
    └── flux/                         # Flux (experimental)

🚀 Quick Start

Prerequisites

Python 3.10+
CUDA-capable GPU (24GB+ VRAM recommended)
For Step 3: Either GPU for Qwen3-VL (default, free) or API key for Mistral/OpenAI

Step 1: Filter Images

Select high-quality, well-lit images from your dataset using CLIP-based filtering.

cd filter_images
pip install -r requirements.txt

# Filter top 12k images with best lighting
python filter_lighting_images.py \
    --dataset_path /path/to/your/images \
    --output_dir ./output \
    --num_images 12000 \
    --batch_size 64

# Create train/val/test splits
python analyze_results.py \
    --results_json ./output/filtered_images.json \
    --output_dir ./output \
    --create_splits

Output: train_images.csv, val_images.csv, test_images.csv

📖 See filter_images/README.md for details.

Step 2: Generate Albedo/Degraded Images

Process filtered images to create degraded versions (flat lighting) for training pairs.

cd albedo/relightingDataGen-parallel

# Create and activate environment
conda create -n sam3 python=3.10 -y
conda activate sam3

# Install dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
pip install git+https://github.com/facebookresearch/sam2.git

# Run multi-GPU processing
python scripts/run_multi_gpu_batched.py \
    --config config/mvp_config.yaml \
    --csv ../../filter_images/output/train_images.csv \
    --num-gpus 8 \
    --batch-size 8

Output:

Images in data-train/
CSV in albedo_csv_files/train_images_with_albedo.csv

📖 See albedo/relightingDataGen-parallel/README.md for details.

Step 3: Generate Lighting Keywords

Use a VLM to generate lighting description keywords for each original image. Default: Qwen3-VL-30B (free, runs locally with vLLM).

cd edit_keywords
pip install -r requirements.txt

# Option 1: Qwen3-VL with vLLM (DEFAULT - free, fast)
python generate_keywords.py \
    --csv ../albedo/relightingDataGen-parallel/albedo_csv_files/train_images_with_albedo.csv \
    --output_dir ./output \
    --batch_size 8

# Option 2: Mistral API
export MISTRAL_API_KEY="your-api-key"
python generate_keywords.py \
    --csv ../albedo/relightingDataGen-parallel/albedo_csv_files/train_images_with_albedo.csv \
    --output_dir ./output \
    --provider mistral

Output: CSV with 4 columns:

image_path → Original image (becomes training OUTPUT)
lighting_score → CLIP score
output_image_path → Degraded image (becomes training INPUT)
lighting_keywords → Edit instruction (e.g., "sunlight through blinds, indoor")

Example Keywords Generated:

Image	Keywords
Portrait with window	"sunlight through the blinds, near window blinds"
Beach scene	"sunlight from the left side, beach"
Forest portrait	"magic golden lit, forest"
Night cityscape	"neo punk, city night"

📖 See edit_keywords/README.md for details.

Step 4: Train the Model

Train an InstructPix2Pix model on your generated data.

cd training/sd1_5
pip install -r requirements.txt

# Prepare training data
python ../../edit_keywords/prepare_training_data.py \
    --csv ../../edit_keywords/output/train_images_with_albedo_with_keywords.csv \
    --output_dir ./data_triplets

# Convert to HuggingFace dataset
python convert_to_hf_dataset.py --data_dir ./data_triplets --output_dir ./data_hf

# Configure and train
./setup_accelerate.sh
./train.sh --data_dir ./data_hf

📖 See training/README.md for details.

💡 Complete End-to-End Workflow

# ═══════════════════════════════════════════════════════════════
# STEP 1: Filter Images (~1-2 hours)
# ═══════════════════════════════════════════════════════════════
cd filter_images
python filter_lighting_images.py \
    --dataset_path /path/to/ffhq \
    --output_dir ./ffhq_filtered \
    --num_images 12000

python analyze_results.py \
    --results_json ./ffhq_filtered/filtered_images.json \
    --output_dir ./ffhq_filtered \
    --create_splits

# ═══════════════════════════════════════════════════════════════
# STEP 2: Generate Albedo/Degraded Images (~2-4 hours for 10k)
# ═══════════════════════════════════════════════════════════════
cd ../albedo/relightingDataGen-parallel
conda activate sam3

python scripts/run_multi_gpu_batched.py \
    --config config/mvp_config.yaml \
    --csv ../../filter_images/ffhq_filtered/train_images.csv \
    --num-gpus 8 \
    --batch-size 8

# ═══════════════════════════════════════════════════════════════
# STEP 3: Generate Lighting Keywords (~20-30 min with Qwen3-VL)
# ═══════════════════════════════════════════════════════════════
cd ../../edit_keywords

# Default: Qwen3-VL-30B with vLLM (free, fast)
python generate_keywords.py \
    --csv ../albedo/relightingDataGen-parallel/albedo_csv_files/train_images_with_albedo.csv \
    --output_dir ./output \
    --batch_size 8

# Prepare training format
python prepare_training_data.py \
    --csv ./output/train_images_with_albedo_with_keywords.csv \
    --output_dir ../training/sd1_5/data_triplets

# ═══════════════════════════════════════════════════════════════
# STEP 4: Train Model (~1.5-2 days for SD1.5)
# ═══════════════════════════════════════════════════════════════
cd ../training/sd1_5

python convert_to_hf_dataset.py \
    --data_dir ./data_triplets \
    --output_dir ./data_hf

./train.sh --data_dir ./data_hf

# ═══════════════════════════════════════════════════════════════
# INFERENCE
# ═══════════════════════════════════════════════════════════════
python inference.py \
    --model_path ./output/instruct-pix2pix-sd15 \
    --input_image test.jpg \
    --instruction "sunlight through the blinds, near window" \
    --output_path result.png

📊 Model Comparison

Model	Quality	Training Time	Resolution	Status
SD 1.5	Good ⭐⭐⭐	~1.5-2 days	512×512	✅ Ready
SDXL	Excellent ⭐⭐⭐⭐⭐	~3-5 days	1024×1024	✅ Ready
Flux	Best? ⭐⭐⭐⭐⭐⭐	TBD	1024×1024	⏳ Experimental

Recommendation: Start with SD 1.5 for rapid prototyping, then scale to SDXL for production.

🔧 Hardware Requirements

Component	Minimum	Recommended
GPU	1x 24GB	8x A100 (80GB)
RAM	32GB	64GB+
Storage	500GB	2TB+ SSD

Per-Step Resource Usage

Step	GPU Memory	Time (10k images)
1. Filter Images	~4GB	~1-2 hours
2. Generate Albedo	~8-12GB/GPU	~2-4 hours (8 GPU)
3. Edit Keywords (Qwen3-VL)	~40GB (4x24GB TP)	~20-30 min
4. Training SD1.5	~35-45GB/GPU	~1.5-2 days

📚 Documentation

Component	Documentation
Image Filtering	`filter_images/README.md`
Albedo Generation	`albedo/relightingDataGen-parallel/README.md`
Keyword Generation	`edit_keywords/README.md`
Model Training	`training/README.md`

🔬 Methodology

Training Data Creation

Original Image → Has real-world lighting (shadows, highlights, etc.)
Albedo Extraction → Remove lighting to get flat, uniformly-lit image
Degradation → Apply synthetic lighting variations
Keywords → VLM describes the original image's lighting

Training Objective

The model learns:

"Given a flat-lit/degraded image + lighting description → Produce realistically lit output"

This is the inverse of traditional relighting:

Input: Degraded image (flat lighting)
Instruction: Lighting keywords ("sunlight through blinds")
Output: Original image (with real lighting)

📖 References

Qwen3-VL: HuggingFace | GitHub
vLLM: Docs | Qwen3-VL Guide
SAM2/SAM3: GitHub
CLIP: OpenAI
HuggingFace Diffusers: GitHub

📄 License

This project is provided for research and educational purposes.

Happy Relighting! 🎨✨

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
albedo		albedo
edit_keywords		edit_keywords
filter_images		filter_images
training		training
.gitignore		.gitignore
README.md		README.md
REORGANIZATION_COMPLETE.txt		REORGANIZATION_COMPLETE.txt
START_HERE.txt		START_HERE.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Image Relighting with Diffusion Models

🎯 Pipeline Overview

📁 Project Structure

🚀 Quick Start

Prerequisites

Step 1: Filter Images

Step 2: Generate Albedo/Degraded Images

Step 3: Generate Lighting Keywords

Step 4: Train the Model

💡 Complete End-to-End Workflow

📊 Model Comparison

🔧 Hardware Requirements

Per-Step Resource Usage

📚 Documentation

🔬 Methodology

Training Data Creation

Training Objective

📖 References

📄 License

About

Uh oh!

Releases

Packages

Languages

nishitanand/image-relighting-diffusion

Folders and files

Latest commit

History

Repository files navigation

Image Relighting with Diffusion Models

🎯 Pipeline Overview

📁 Project Structure

🚀 Quick Start

Prerequisites

Step 1: Filter Images

Step 2: Generate Albedo/Degraded Images

Step 3: Generate Lighting Keywords

Step 4: Train the Model

💡 Complete End-to-End Workflow

📊 Model Comparison

🔧 Hardware Requirements

Per-Step Resource Usage

📚 Documentation

🔬 Methodology

Training Data Creation

Training Objective

📖 References

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages