gaarutyunov · gaarutyunov · Nov 13, 2025
diff --git a/INSTALL.md b/INSTALL.md
@@ -101,6 +101,117 @@ If you encounter issues with video loading, ensure decord2 is properly installed
 pip install --upgrade decord2
 ```
 
+## Dataset Setup
+
+### Downloading Something-Something-v2 Dataset
+
+The Something-Something-v2 (SSv2) dataset is required for training the video classifier. Follow these steps to download it:
+
+#### 1. Register and Request Access
+
+1. Visit the [20BN Something-Something Dataset page](https://developer.qualcomm.com/software/ai-datasets/something-something)
+2. Create an account or sign in with Qualcomm Developer credentials
+3. Accept the terms and conditions
+4. Request access to the Something-Something-v2 dataset
+
+**Note**: The dataset is hosted by Qualcomm and requires registration. Access is typically granted within 24-48 hours.
+
+#### 2. Download the Dataset
+
+Once approved, you'll receive download links. The dataset consists of:
+
+- **Videos**: ~220GB compressed, ~500GB uncompressed
+  - 168,913 training videos
+  - 24,777 validation videos
+  - 27,157 test videos (labels not publicly available)
+- **Labels**: JSON files with annotations
+  - `train.json`: Training annotations
+  - `validation.json`: Validation annotations
+  - `labels.json`: Class label mappings (174 action classes)
+
+**Download structure**:
+```
+20bn-something-something-v2/
+├── 20bn-something-something-v2-?? (video archives)
+└── labels/ (annotation files)
+```
+
+#### 3. Extract Videos
+
+After downloading, extract the video archives:
+
+```bash
+# Create videos directory
+mkdir -p videos/20bn-something-something-v2
+
+# Extract all parts (this may take a while)
+cd videos
+cat 20bn-something-something-v2-?? | tar -xzv
+
+# Verify extraction
+ls 20bn-something-something-v2/ | wc -l  # Should show ~220,847 .webm files
+```
+
+#### 4. Organize Labels
+
+Create a labels directory with the annotation files:
+
+```bash
+mkdir -p videos/labels
+# Move or copy the JSON files
+mv train.json validation.json labels.json videos/labels/
+```
+
+#### 5. Expected Directory Structure
+
+After setup, your directory should look like:
+
+```
+videos/
+├── 20bn-something-something-v2/
+│   ├── 1.webm
+│   ├── 2.webm
+│   ├── ...
+│   └── 220847.webm
+└── labels/
+    ├── train.json
+    ├── validation.json
+    └── labels.json
+```
+
+#### 6. Verify Dataset
+
+Run a quick verification to ensure the dataset is properly set up:
+
+```bash
+# Check video count
+find videos/20bn-something-something-v2 -name "*.webm" | wc -l
+
+# Check label files
+for f in videos/labels/{train,validation,labels}.json; do
+    echo "Checking $f..."
+    python -c "import json; data=json.load(open('$f')); print(f'  Entries: {len(data)}')"
+done
+```
+
+Expected output:
+- Videos: ~220,847 files
+- train.json: ~168,913 entries
+- validation.json: ~24,777 entries
+- labels.json: 174 entries
+
+#### Alternative: Using Subset for Testing
+
+For quick testing without downloading the full dataset:
+
+```bash
+# Use the --subset-size flag when training
+python train_ssv2_classifier.py \
+    --videos-dir videos/20bn-something-something-v2 \
+    --labels-dir videos/labels \
+    --subset-size 1000  # Use only 1000 samples
+```
+
 ### Memory Issues
 
 If you encounter out-of-memory errors during training:
@@ -171,7 +282,223 @@ To remove the package:
 pip uninstall vjepa2-mlx
 ```
 
-## Next Steps
+## Fine-tuning the Model
+
+### Overview
+
+The V-JEPA 2 MLX training pipeline uses a **frozen encoder** approach where:
+- The pretrained V-JEPA 2 encoder remains **frozen** (no gradient updates)
+- Only the **attentive classifier head** is trained
+- This approach is efficient, fast, and requires less memory
+
+### Quick Start Fine-tuning
+
+#### 1. Using the Training Script
+
+```bash
+# Basic training command
+python train_ssv2_classifier.py \
+    --videos-dir videos/20bn-something-something-v2 \
+    --labels-dir videos/labels \
+    --pretrained-weights weights/vitl_mlx.safetensors \
+    --output-dir output_ssv2_classifier \
+    --batch-size 4 \
+    --num-epochs 10 \
+    --use-wandb
+```
+
+#### 2. Using the Configuration File
+
+Edit `configs/train/ssv2_classifier_default.yaml` and run:
+
+```bash
+python train_ssv2_classifier.py --config configs/train/ssv2_classifier_default.yaml
+```
+
+#### 3. Using the Shell Script
+
+```bash
+# Make executable (first time only)
+chmod +x scripts/train_ssv2.sh
+
+# Run with defaults
+./scripts/train_ssv2.sh
+
+# Run with custom settings
+BATCH_SIZE=8 NUM_EPOCHS=20 USE_WANDB=true ./scripts/train_ssv2.sh
+```
+
+### Fine-tuning Configuration
+
+#### Key Hyperparameters
+
+Edit `configs/train/ssv2_classifier_default.yaml` to customize:
+
+```yaml
+training:
+  batch_size: 4              # Adjust based on memory (2-8)
+  num_epochs: 10             # 10-30 for production
+  learning_rate: 0.001       # 1e-3 to 5e-4 typical range
+  weight_decay: 0.05         # AdamW regularization
+  warmup_epochs: 1           # LR warmup period
+  gradient_accumulation_steps: 1  # For larger effective batch
+  save_every_steps: 1000     # Checkpoint frequency
+```
+
+#### Memory Optimization
+
+For limited memory (16GB RAM):
+
+```bash
+python train_ssv2_classifier.py \
+    --batch-size 2 \
+    --gradient-accumulation-steps 4  # Effective batch size = 8
+```
+
+For more memory (32GB+ RAM):
+
+```bash
+python train_ssv2_classifier.py \
+    --batch-size 8 \
+    --gradient-accumulation-steps 1
+```
+
+#### Quick Testing with Subset
+
+Test your setup with a small data subset:
+
+```bash
+python train_ssv2_classifier.py \
+    --videos-dir videos/20bn-something-something-v2 \
+    --labels-dir videos/labels \
+    --pretrained-weights weights/vitl_mlx.safetensors \
+    --subset-size 1000 \
+    --num-epochs 2 \
+    --verbose
+```
+
+### Advanced Fine-tuning
+
+#### Resume from Checkpoint
+
+```bash
+python train_ssv2_classifier.py \
+    --resume-from output_ssv2_classifier/classifier_step_5000.safetensors \
+    --output-dir output_ssv2_classifier
+```
+
+Or in config:
+```yaml
+output:
+  resume_from: "output_ssv2_classifier/classifier_step_5000.safetensors"
+```
+
+#### Weights & Biases Integration
+
+Enable experiment tracking:
+
+```bash
+python train_ssv2_classifier.py \
+    --use-wandb \
+    --wandb-project "my-ssv2-experiments" \
+    --wandb-entity "my-team" \
+    --wandb-run-name "vitl-bs8-lr1e3"
+```
+
+Or in config:
+```yaml
+wandb:
+  enabled: true
+  project: "my-ssv2-experiments"
+  entity: "my-team"
+  run_name: "vitl-bs8-lr1e3"
+```
+
+#### Adjusting Model Architecture
+
+Customize classifier architecture in config:
+
+```yaml
+model:
+  num_probe_blocks: 1    # Classifier depth (1-3)
+  num_heads: 16          # Attention heads (8, 12, 16)
+  frames_per_clip: 16    # Temporal resolution (8, 16, 32)
+  resolution: 224        # Spatial resolution (224, 256)
+```
+
+### Expected Training Performance
+
+On Apple Silicon (M2 Max, 64GB):
+- **Training speed**: ~2-3 samples/sec with batch size 4
+- **Memory usage**: ~8-12GB during training
+- **Full epoch time**: ~18-24 hours for full SSv2 dataset
+- **Validation accuracy**: ~40-50% top-1 after 10 epochs
+
+### Output Files
+
+Training produces:
+
+```
+output_ssv2_classifier/
+├── training_YYYYMMDD_HHMMSS.log      # Training log
+├── best_classifier.safetensors        # Best model (highest val acc)
+├── classifier_step_1000.safetensors   # Periodic checkpoints
+├── classifier_step_2000.safetensors
+└── training_history.json              # Metrics history
+```
+
+### Monitoring Training
+
+#### View Real-time Logs
+
+```bash
+tail -f output_ssv2_classifier/training_*.log
+```
+
+#### Check Training History
+
+```python
+import json
+with open('output_ssv2_classifier/training_history.json') as f:
+    history = json.load(f)
+    print(f"Best validation accuracy: {max(history['val_acc']):.2%}")
+```
+
+#### Weights & Biases Dashboard
+
+If using W&B, view metrics at: `https://wandb.ai/{entity}/{project}`
+
+### Troubleshooting Fine-tuning
+
+#### Low Validation Accuracy
+
+- Increase `num_epochs` (try 20-30)
+- Adjust `learning_rate` (try 5e-4 or 2e-3)
+- Increase `batch_size` or gradient accumulation
+- Verify dataset integrity
+
+#### Out of Memory
+
+- Reduce `batch_size`
+- Enable gradient accumulation
+- Reduce `frames_per_clip` (try 8 or 12)
+- Lower `resolution` (try 192 or 200)
+
+#### Slow Training
+
+- Increase `batch_size` if memory allows
+- Reduce `frames_per_clip` for faster video loading
+- Use `--subset-size` for initial experiments
+- Close other memory-intensive applications
+
+#### Training Crashes
+
+- Check dataset paths are correct
+- Verify pretrained weights file exists
+- Ensure sufficient disk space for checkpoints
+- Check video files are not corrupted
+
+### Next Steps
 
 After installation: