gaarutyunov
diff --git a/‎.dockerignore‎
Lines changed: 3 additions & 0 deletions b/‎.dockerignore‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 3 additions & 1 deletion b/‎.gitignore‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎EVALUATION.md‎
Lines changed: 197 additions & 0 deletions b/‎EVALUATION.md‎
Lines changed: 197 additions & 0 deletions
diff --git a/‎convert_videos_to_mp4.py‎
Lines changed: 42 additions & 0 deletions b/‎convert_videos_to_mp4.py‎
Lines changed: 42 additions & 0 deletions
@@ -56,3 +56,6 @@ temp/
 
 # macOS
 .DS_Store
+
+# Evaluation results
+evaluation_results/
@@ -90,4 +90,6 @@ weights/
 # Temporary files
 tmp/
 temp/
-*.tmp
+*.tmp
+
+evaluation_results/
@@ -0,0 +1,197 @@
+# Multithreaded SSv2 Evaluation
+
+This directory contains a high-performance multiprocessing evaluation script for the SSv2 video classifier.
+
+## Features
+
+- **Parallel Processing**: Uses Python's `ProcessPoolExecutor` to run multiple worker processes in parallel
+- **Independent Models**: Each worker loads its own copy of the encoder and classifier
+- **Memory Efficient**: Each worker uses ~2 GB RAM, allowing up to 10 workers on a MacBook with 32GB RAM
+- **Progress Tracking**: Real-time progress bar showing evaluation status
+- **Comprehensive Logging**: Detailed logs saved to file with configurable log levels
+- **Comprehensive Metrics**: Generates accuracy, precision, recall, F1-score, confusion matrix, and full classification report
+
+## Usage
+
+### Quick Start
+
+```bash
+# Run with default settings (10 workers, full test set)
+./scripts/evaluate_ssv2_mt.sh
+
+# Run with custom number of workers
+NUM_WORKERS=8 ./scripts/evaluate_ssv2_mt.sh
+
+# Test with a small subset
+SUBSET_SIZE=100 NUM_WORKERS=4 ./scripts/evaluate_ssv2_mt.sh
+
+# Enable debug logging
+LOG_LEVEL=DEBUG NUM_WORKERS=4 ./scripts/evaluate_ssv2_mt.sh
+```
+
+### Direct Python Invocation
+
+```bash
+python3 evaluate_ssv2_multithreaded.py \
+    --videos-dir videos/20bn-something-something-v2 \
+    --test-csv videos/labels/test-answers.csv \
+    --labels-json videos/labels/labels.json \
+    --encoder-weights weights/vitl_mlx.safetensors \
+    --classifier-weights output_ssv2_classifier/best_classifier.safetensors \
+    --num-workers 10 \
+    --output-dir evaluation_results \
+    --log-level INFO
+```
+
+## Arguments
+
+- `--videos-dir`: Path to the videos directory
+- `--test-csv`: Path to test answers CSV file
+- `--labels-json`: Path to labels JSON file
+- `--encoder-weights`: Path to pretrained encoder weights
+- `--classifier-weights`: Path to trained classifier weights
+- `--num-frames`: Number of frames per clip (default: 16)
+- `--resolution`: Video resolution (default: 224)
+- `--tubelet-size`: Tubelet size for 3D patch embedding (default: 2)
+- `--num-classes`: Number of classes (default: 174)
+- `--num-workers`: Number of worker processes (default: CPU count)
+- `--output-dir`: Output directory for results (default: evaluation_results)
+- `--subset-size`: Evaluate only a subset of samples for testing
+- `--log-level`: Logging level: DEBUG, INFO, WARNING, or ERROR (default: INFO)
+
+## Output Files
+
+The script generates multiple files in the output directory:
+
+1. **evaluation_summary.json**: Overall metrics including accuracy, precision, recall, F1-scores, timing information
+2. **classification_report.txt**: Detailed per-class metrics from scikit-learn
+3. **confusion_matrix.npy**: Confusion matrix as NumPy array
+4. **evaluation_YYYYMMDD_HHMMSS.log**: Main process log with orchestration details
+5. **worker_N_YYYYMMDD_HHMMSS.log**: Individual log file for each worker process (N = worker ID)
+
+### Log File Details
+
+The evaluation creates separate log files for better debugging and analysis:
+
+**Main Log (`evaluation_*.log`)**:
+- Overall configuration and setup
+- Data distribution across workers
+- Worker task submission
+- Aggregate results and metrics
+- Final timing and throughput
+
+**Worker Logs (`worker_N_*.log`)**:
+- Model initialization for each worker
+- Video processing progress
+- Per-video errors and warnings
+- Worker-specific performance metrics
+- Individual worker completion status
+
+Example main log entry:
+```
+2025-11-16 22:04:24 - ssv2_evaluation - INFO - Starting parallel evaluation...
+2025-11-16 22:04:24 - ssv2_evaluation - INFO - Submitted 2 worker tasks
+```
+
+Example worker log entries:
+```
+2025-11-16 22:04:26 - worker_0 - INFO - Worker 0: Models loaded successfully
+2025-11-16 22:04:26 - worker_0 - INFO - Worker 0: Processing 3 samples...
+2025-11-16 22:04:27 - worker_0 - INFO - Worker 0: Completed 3 samples in 1.5s (1.94 videos/sec) - Success: 3, Failed: 0
+```
+
+This separation makes it easy to:
+- Track individual worker performance
+- Identify which worker encountered errors
+- Debug specific video processing issues
+- Analyze parallel execution patterns
+
+## Performance
+
+On a MacBook with M-series chip:
+- **10 workers**: ~20 GB RAM usage, optimal for 32GB systems
+- **8 workers**: ~16 GB RAM usage, optimal for 16GB systems
+- **4 workers**: ~8 GB RAM usage, safe for 8GB systems
+
+Processing time depends on:
+- Number of workers
+- Video resolution and length
+- Model size
+- Disk I/O speed
+
+Typical performance: ~2-5 videos/second/worker
+
+## Technical Notes
+
+### Why Multiprocessing Instead of Multithreading?
+
+MLX uses GPU command buffers that cannot be safely shared across threads. Using separate processes ensures each worker has its own MLX context and GPU resources.
+
+### Memory Considerations
+
+Each worker loads a complete copy of both the encoder (~1.5GB) and classifier (~0.5GB). Monitor memory usage with:
+
+```bash
+# macOS
+top -pid $(pgrep -f evaluate_ssv2_multithreaded)
+
+# Or use Activity Monitor
+```
+
+### Batch Distribution
+
+The script automatically splits the test set into equal batches for each worker, with any remainder distributed evenly across the first few workers.
+
+## Troubleshooting
+
+**Out of Memory Error**: Reduce the number of workers
+```bash
+NUM_WORKERS=4 ./scripts/evaluate_ssv2_mt.sh
+```
+
+**Slow Performance**: Check disk I/O and ensure videos are on a fast drive (SSD recommended)
+
+**Process Crashes**: Ensure you have enough available RAM and no other heavy processes running
+
+## Example Output
+
+```
+Starting multithreaded evaluation with 10 workers
+MLX device: Device(gpu, 0)
+Expected memory per worker: ~2 GB
+Total expected memory: ~20 GB
+
+Loading labels...
+Loaded 174 classes
+Loading test data...
+Loaded 27158 test samples
+
+Split data into 10 batches
+  Worker 0: 2716 samples
+  Worker 1: 2716 samples
+  ...
+
+Starting evaluation...
+Evaluating: 100%|██████████| 27158/27158 [15:32<00:00, 29.12it/s]
+
+Evaluation complete!
+Successfully evaluated: 27158 samples
+Failed: 0 samples
+
+================================================================================
+EVALUATION RESULTS
+================================================================================
+
+Test Set Accuracy: 0.6542 (65.42%)
+Correct Predictions: 17765 / 27158
+
+Macro Average:
+  Precision: 0.6234
+  Recall:    0.6189
+  F1-Score:  0.6211
+
+Weighted Average:
+  Precision: 0.6498
+  Recall:    0.6542
+  F1-Score:  0.6519
+```
@@ -0,0 +1,42 @@
+#!/usr/bin/env python3
+"""Convert example WebM videos to MP4 for better browser compatibility."""
+import subprocess
+from pathlib import Path
+import json
+
+examples_dir = Path('notebooks/examples')
+
+# Load selected samples
+with open(examples_dir / 'selected_samples.json', 'r') as f:
+    selected_samples = json.load(f)
+
+print("Converting WebM videos to MP4...")
+
+for sample in selected_samples:
+    video_id = sample['video_id']
+    webm_path = examples_dir / f'{video_id}.webm'
+    mp4_path = examples_dir / f'{video_id}.mp4'
+    
+    if webm_path.exists():
+        print(f"Converting {video_id}.webm to MP4...")
+        try:
+            # Use ffmpeg to convert (handle videos without audio and odd dimensions)
+            subprocess.run([
+                'ffmpeg', '-i', str(webm_path),
+                '-vf', 'scale=trunc(iw/2)*2:trunc(ih/2)*2',  # Ensure dimensions are divisible by 2
+                '-c:v', 'libx264',  # H.264 codec
+                '-preset', 'fast',
+                '-crf', '23',
+                '-an',  # Remove audio track (some videos don't have audio)
+                '-y',  # Overwrite output file
+                str(mp4_path)
+            ], check=True, capture_output=True)
+            print(f"✓ Created {video_id}.mp4")
+        except subprocess.CalledProcessError as e:
+            print(f"✗ Failed to convert {video_id}.webm")
+            if e.stderr:
+                print(f"   Error: {e.stderr.decode()}")
+    else:
+        print(f"✗ {video_id}.webm not found")
+
+print("\nConversion complete!")