Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
329 changes: 328 additions & 1 deletion INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,117 @@ If you encounter issues with video loading, ensure decord2 is properly installed
pip install --upgrade decord2
```

## Dataset Setup

### Downloading Something-Something-v2 Dataset

The Something-Something-v2 (SSv2) dataset is required for training the video classifier. Follow these steps to download it:

#### 1. Register and Request Access

1. Visit the [20BN Something-Something Dataset page](https://developer.qualcomm.com/software/ai-datasets/something-something)
2. Create an account or sign in with Qualcomm Developer credentials
3. Accept the terms and conditions
4. Request access to the Something-Something-v2 dataset

**Note**: The dataset is hosted by Qualcomm and requires registration. Access is typically granted within 24-48 hours.

#### 2. Download the Dataset

Once approved, you'll receive download links. The dataset consists of:

- **Videos**: ~220GB compressed, ~500GB uncompressed
- 168,913 training videos
- 24,777 validation videos
- 27,157 test videos (labels not publicly available)
- **Labels**: JSON files with annotations
- `train.json`: Training annotations
- `validation.json`: Validation annotations
- `labels.json`: Class label mappings (174 action classes)

**Download structure**:
```
20bn-something-something-v2/
├── 20bn-something-something-v2-?? (video archives)
└── labels/ (annotation files)
```

#### 3. Extract Videos

After downloading, extract the video archives:

```bash
# Create videos directory
mkdir -p videos/20bn-something-something-v2

# Extract all parts (this may take a while)
cd videos
cat 20bn-something-something-v2-?? | tar -xzv

# Verify extraction
ls 20bn-something-something-v2/ | wc -l # Should show ~220,847 .webm files
```

#### 4. Organize Labels

Create a labels directory with the annotation files:

```bash
mkdir -p videos/labels
# Move or copy the JSON files
mv train.json validation.json labels.json videos/labels/
```

#### 5. Expected Directory Structure

After setup, your directory should look like:

```
videos/
├── 20bn-something-something-v2/
│ ├── 1.webm
│ ├── 2.webm
│ ├── ...
│ └── 220847.webm
└── labels/
├── train.json
├── validation.json
└── labels.json
```

#### 6. Verify Dataset

Run a quick verification to ensure the dataset is properly set up:

```bash
# Check video count
find videos/20bn-something-something-v2 -name "*.webm" | wc -l

# Check label files
for f in videos/labels/{train,validation,labels}.json; do
echo "Checking $f..."
python -c "import json; data=json.load(open('$f')); print(f' Entries: {len(data)}')"
done
```

Expected output:
- Videos: ~220,847 files
- train.json: ~168,913 entries
- validation.json: ~24,777 entries
- labels.json: 174 entries

#### Alternative: Using Subset for Testing

For quick testing without downloading the full dataset:

```bash
# Use the --subset-size flag when training
python train_ssv2_classifier.py \
--videos-dir videos/20bn-something-something-v2 \
--labels-dir videos/labels \
--subset-size 1000 # Use only 1000 samples
```

### Memory Issues

If you encounter out-of-memory errors during training:
Expand Down Expand Up @@ -171,7 +282,223 @@ To remove the package:
pip uninstall vjepa2-mlx
```

## Next Steps
## Fine-tuning the Model

### Overview

The V-JEPA 2 MLX training pipeline uses a **frozen encoder** approach where:
- The pretrained V-JEPA 2 encoder remains **frozen** (no gradient updates)
- Only the **attentive classifier head** is trained
- This approach is efficient, fast, and requires less memory

### Quick Start Fine-tuning

#### 1. Using the Training Script

```bash
# Basic training command
python train_ssv2_classifier.py \
--videos-dir videos/20bn-something-something-v2 \
--labels-dir videos/labels \
--pretrained-weights weights/vitl_mlx.safetensors \
--output-dir output_ssv2_classifier \
--batch-size 4 \
--num-epochs 10 \
--use-wandb
```

#### 2. Using the Configuration File

Edit `configs/train/ssv2_classifier_default.yaml` and run:

```bash
python train_ssv2_classifier.py --config configs/train/ssv2_classifier_default.yaml
```

#### 3. Using the Shell Script

```bash
# Make executable (first time only)
chmod +x scripts/train_ssv2.sh

# Run with defaults
./scripts/train_ssv2.sh

# Run with custom settings
BATCH_SIZE=8 NUM_EPOCHS=20 USE_WANDB=true ./scripts/train_ssv2.sh
```

### Fine-tuning Configuration

#### Key Hyperparameters

Edit `configs/train/ssv2_classifier_default.yaml` to customize:

```yaml
training:
batch_size: 4 # Adjust based on memory (2-8)
num_epochs: 10 # 10-30 for production
learning_rate: 0.001 # 1e-3 to 5e-4 typical range
weight_decay: 0.05 # AdamW regularization
warmup_epochs: 1 # LR warmup period
gradient_accumulation_steps: 1 # For larger effective batch
save_every_steps: 1000 # Checkpoint frequency
```

#### Memory Optimization

For limited memory (16GB RAM):

```bash
python train_ssv2_classifier.py \
--batch-size 2 \
--gradient-accumulation-steps 4 # Effective batch size = 8
```

For more memory (32GB+ RAM):

```bash
python train_ssv2_classifier.py \
--batch-size 8 \
--gradient-accumulation-steps 1
```

#### Quick Testing with Subset

Test your setup with a small data subset:

```bash
python train_ssv2_classifier.py \
--videos-dir videos/20bn-something-something-v2 \
--labels-dir videos/labels \
--pretrained-weights weights/vitl_mlx.safetensors \
--subset-size 1000 \
--num-epochs 2 \
--verbose
```

### Advanced Fine-tuning

#### Resume from Checkpoint

```bash
python train_ssv2_classifier.py \
--resume-from output_ssv2_classifier/classifier_step_5000.safetensors \
--output-dir output_ssv2_classifier
```

Or in config:
```yaml
output:
resume_from: "output_ssv2_classifier/classifier_step_5000.safetensors"
```

#### Weights & Biases Integration

Enable experiment tracking:

```bash
python train_ssv2_classifier.py \
--use-wandb \
--wandb-project "my-ssv2-experiments" \
--wandb-entity "my-team" \
--wandb-run-name "vitl-bs8-lr1e3"
```

Or in config:
```yaml
wandb:
enabled: true
project: "my-ssv2-experiments"
entity: "my-team"
run_name: "vitl-bs8-lr1e3"
```

#### Adjusting Model Architecture

Customize classifier architecture in config:

```yaml
model:
num_probe_blocks: 1 # Classifier depth (1-3)
num_heads: 16 # Attention heads (8, 12, 16)
frames_per_clip: 16 # Temporal resolution (8, 16, 32)
resolution: 224 # Spatial resolution (224, 256)
```

### Expected Training Performance

On Apple Silicon (M2 Max, 64GB):
- **Training speed**: ~2-3 samples/sec with batch size 4
- **Memory usage**: ~8-12GB during training
- **Full epoch time**: ~18-24 hours for full SSv2 dataset
- **Validation accuracy**: ~40-50% top-1 after 10 epochs

### Output Files

Training produces:

```
output_ssv2_classifier/
├── training_YYYYMMDD_HHMMSS.log # Training log
├── best_classifier.safetensors # Best model (highest val acc)
├── classifier_step_1000.safetensors # Periodic checkpoints
├── classifier_step_2000.safetensors
└── training_history.json # Metrics history
```

### Monitoring Training

#### View Real-time Logs

```bash
tail -f output_ssv2_classifier/training_*.log
```

#### Check Training History

```python
import json
with open('output_ssv2_classifier/training_history.json') as f:
history = json.load(f)
print(f"Best validation accuracy: {max(history['val_acc']):.2%}")
```

#### Weights & Biases Dashboard

If using W&B, view metrics at: `https://wandb.ai/{entity}/{project}`

### Troubleshooting Fine-tuning

#### Low Validation Accuracy

- Increase `num_epochs` (try 20-30)
- Adjust `learning_rate` (try 5e-4 or 2e-3)
- Increase `batch_size` or gradient accumulation
- Verify dataset integrity

#### Out of Memory

- Reduce `batch_size`
- Enable gradient accumulation
- Reduce `frames_per_clip` (try 8 or 12)
- Lower `resolution` (try 192 or 200)

#### Slow Training

- Increase `batch_size` if memory allows
- Reduce `frames_per_clip` for faster video loading
- Use `--subset-size` for initial experiments
- Close other memory-intensive applications

#### Training Crashes

- Check dataset paths are correct
- Verify pretrained weights file exists
- Ensure sufficient disk space for checkpoints
- Check video files are not corrupted

### Next Steps

After installation:

Expand Down
Loading
Loading