Skip to content

Commit 71e6389

Browse files
committed
evaluation
1 parent 3fa2de8 commit 71e6389

20 files changed

+2236
-1
lines changed

.dockerignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,3 +56,6 @@ temp/
5656

5757
# macOS
5858
.DS_Store
59+
60+
# Evaluation results
61+
evaluation_results/

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,4 +90,6 @@ weights/
9090
# Temporary files
9191
tmp/
9292
temp/
93-
*.tmp
93+
*.tmp
94+
95+
evaluation_results/

EVALUATION.md

Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
# Multithreaded SSv2 Evaluation
2+
3+
This directory contains a high-performance multiprocessing evaluation script for the SSv2 video classifier.
4+
5+
## Features
6+
7+
- **Parallel Processing**: Uses Python's `ProcessPoolExecutor` to run multiple worker processes in parallel
8+
- **Independent Models**: Each worker loads its own copy of the encoder and classifier
9+
- **Memory Efficient**: Each worker uses ~2 GB RAM, allowing up to 10 workers on a MacBook with 32GB RAM
10+
- **Progress Tracking**: Real-time progress bar showing evaluation status
11+
- **Comprehensive Logging**: Detailed logs saved to file with configurable log levels
12+
- **Comprehensive Metrics**: Generates accuracy, precision, recall, F1-score, confusion matrix, and full classification report
13+
14+
## Usage
15+
16+
### Quick Start
17+
18+
```bash
19+
# Run with default settings (10 workers, full test set)
20+
./scripts/evaluate_ssv2_mt.sh
21+
22+
# Run with custom number of workers
23+
NUM_WORKERS=8 ./scripts/evaluate_ssv2_mt.sh
24+
25+
# Test with a small subset
26+
SUBSET_SIZE=100 NUM_WORKERS=4 ./scripts/evaluate_ssv2_mt.sh
27+
28+
# Enable debug logging
29+
LOG_LEVEL=DEBUG NUM_WORKERS=4 ./scripts/evaluate_ssv2_mt.sh
30+
```
31+
32+
### Direct Python Invocation
33+
34+
```bash
35+
python3 evaluate_ssv2_multithreaded.py \
36+
--videos-dir videos/20bn-something-something-v2 \
37+
--test-csv videos/labels/test-answers.csv \
38+
--labels-json videos/labels/labels.json \
39+
--encoder-weights weights/vitl_mlx.safetensors \
40+
--classifier-weights output_ssv2_classifier/best_classifier.safetensors \
41+
--num-workers 10 \
42+
--output-dir evaluation_results \
43+
--log-level INFO
44+
```
45+
46+
## Arguments
47+
48+
- `--videos-dir`: Path to the videos directory
49+
- `--test-csv`: Path to test answers CSV file
50+
- `--labels-json`: Path to labels JSON file
51+
- `--encoder-weights`: Path to pretrained encoder weights
52+
- `--classifier-weights`: Path to trained classifier weights
53+
- `--num-frames`: Number of frames per clip (default: 16)
54+
- `--resolution`: Video resolution (default: 224)
55+
- `--tubelet-size`: Tubelet size for 3D patch embedding (default: 2)
56+
- `--num-classes`: Number of classes (default: 174)
57+
- `--num-workers`: Number of worker processes (default: CPU count)
58+
- `--output-dir`: Output directory for results (default: evaluation_results)
59+
- `--subset-size`: Evaluate only a subset of samples for testing
60+
- `--log-level`: Logging level: DEBUG, INFO, WARNING, or ERROR (default: INFO)
61+
62+
## Output Files
63+
64+
The script generates multiple files in the output directory:
65+
66+
1. **evaluation_summary.json**: Overall metrics including accuracy, precision, recall, F1-scores, timing information
67+
2. **classification_report.txt**: Detailed per-class metrics from scikit-learn
68+
3. **confusion_matrix.npy**: Confusion matrix as NumPy array
69+
4. **evaluation_YYYYMMDD_HHMMSS.log**: Main process log with orchestration details
70+
5. **worker_N_YYYYMMDD_HHMMSS.log**: Individual log file for each worker process (N = worker ID)
71+
72+
### Log File Details
73+
74+
The evaluation creates separate log files for better debugging and analysis:
75+
76+
**Main Log (`evaluation_*.log`)**:
77+
- Overall configuration and setup
78+
- Data distribution across workers
79+
- Worker task submission
80+
- Aggregate results and metrics
81+
- Final timing and throughput
82+
83+
**Worker Logs (`worker_N_*.log`)**:
84+
- Model initialization for each worker
85+
- Video processing progress
86+
- Per-video errors and warnings
87+
- Worker-specific performance metrics
88+
- Individual worker completion status
89+
90+
Example main log entry:
91+
```
92+
2025-11-16 22:04:24 - ssv2_evaluation - INFO - Starting parallel evaluation...
93+
2025-11-16 22:04:24 - ssv2_evaluation - INFO - Submitted 2 worker tasks
94+
```
95+
96+
Example worker log entries:
97+
```
98+
2025-11-16 22:04:26 - worker_0 - INFO - Worker 0: Models loaded successfully
99+
2025-11-16 22:04:26 - worker_0 - INFO - Worker 0: Processing 3 samples...
100+
2025-11-16 22:04:27 - worker_0 - INFO - Worker 0: Completed 3 samples in 1.5s (1.94 videos/sec) - Success: 3, Failed: 0
101+
```
102+
103+
This separation makes it easy to:
104+
- Track individual worker performance
105+
- Identify which worker encountered errors
106+
- Debug specific video processing issues
107+
- Analyze parallel execution patterns
108+
109+
## Performance
110+
111+
On a MacBook with M-series chip:
112+
- **10 workers**: ~20 GB RAM usage, optimal for 32GB systems
113+
- **8 workers**: ~16 GB RAM usage, optimal for 16GB systems
114+
- **4 workers**: ~8 GB RAM usage, safe for 8GB systems
115+
116+
Processing time depends on:
117+
- Number of workers
118+
- Video resolution and length
119+
- Model size
120+
- Disk I/O speed
121+
122+
Typical performance: ~2-5 videos/second/worker
123+
124+
## Technical Notes
125+
126+
### Why Multiprocessing Instead of Multithreading?
127+
128+
MLX uses GPU command buffers that cannot be safely shared across threads. Using separate processes ensures each worker has its own MLX context and GPU resources.
129+
130+
### Memory Considerations
131+
132+
Each worker loads a complete copy of both the encoder (~1.5GB) and classifier (~0.5GB). Monitor memory usage with:
133+
134+
```bash
135+
# macOS
136+
top -pid $(pgrep -f evaluate_ssv2_multithreaded)
137+
138+
# Or use Activity Monitor
139+
```
140+
141+
### Batch Distribution
142+
143+
The script automatically splits the test set into equal batches for each worker, with any remainder distributed evenly across the first few workers.
144+
145+
## Troubleshooting
146+
147+
**Out of Memory Error**: Reduce the number of workers
148+
```bash
149+
NUM_WORKERS=4 ./scripts/evaluate_ssv2_mt.sh
150+
```
151+
152+
**Slow Performance**: Check disk I/O and ensure videos are on a fast drive (SSD recommended)
153+
154+
**Process Crashes**: Ensure you have enough available RAM and no other heavy processes running
155+
156+
## Example Output
157+
158+
```
159+
Starting multithreaded evaluation with 10 workers
160+
MLX device: Device(gpu, 0)
161+
Expected memory per worker: ~2 GB
162+
Total expected memory: ~20 GB
163+
164+
Loading labels...
165+
Loaded 174 classes
166+
Loading test data...
167+
Loaded 27158 test samples
168+
169+
Split data into 10 batches
170+
Worker 0: 2716 samples
171+
Worker 1: 2716 samples
172+
...
173+
174+
Starting evaluation...
175+
Evaluating: 100%|██████████| 27158/27158 [15:32<00:00, 29.12it/s]
176+
177+
Evaluation complete!
178+
Successfully evaluated: 27158 samples
179+
Failed: 0 samples
180+
181+
================================================================================
182+
EVALUATION RESULTS
183+
================================================================================
184+
185+
Test Set Accuracy: 0.6542 (65.42%)
186+
Correct Predictions: 17765 / 27158
187+
188+
Macro Average:
189+
Precision: 0.6234
190+
Recall: 0.6189
191+
F1-Score: 0.6211
192+
193+
Weighted Average:
194+
Precision: 0.6498
195+
Recall: 0.6542
196+
F1-Score: 0.6519
197+
```

convert_videos_to_mp4.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
#!/usr/bin/env python3
2+
"""Convert example WebM videos to MP4 for better browser compatibility."""
3+
import subprocess
4+
from pathlib import Path
5+
import json
6+
7+
examples_dir = Path('notebooks/examples')
8+
9+
# Load selected samples
10+
with open(examples_dir / 'selected_samples.json', 'r') as f:
11+
selected_samples = json.load(f)
12+
13+
print("Converting WebM videos to MP4...")
14+
15+
for sample in selected_samples:
16+
video_id = sample['video_id']
17+
webm_path = examples_dir / f'{video_id}.webm'
18+
mp4_path = examples_dir / f'{video_id}.mp4'
19+
20+
if webm_path.exists():
21+
print(f"Converting {video_id}.webm to MP4...")
22+
try:
23+
# Use ffmpeg to convert (handle videos without audio and odd dimensions)
24+
subprocess.run([
25+
'ffmpeg', '-i', str(webm_path),
26+
'-vf', 'scale=trunc(iw/2)*2:trunc(ih/2)*2', # Ensure dimensions are divisible by 2
27+
'-c:v', 'libx264', # H.264 codec
28+
'-preset', 'fast',
29+
'-crf', '23',
30+
'-an', # Remove audio track (some videos don't have audio)
31+
'-y', # Overwrite output file
32+
str(mp4_path)
33+
], check=True, capture_output=True)
34+
print(f"✓ Created {video_id}.mp4")
35+
except subprocess.CalledProcessError as e:
36+
print(f"✗ Failed to convert {video_id}.webm")
37+
if e.stderr:
38+
print(f" Error: {e.stderr.decode()}")
39+
else:
40+
print(f"✗ {video_id}.webm not found")
41+
42+
print("\nConversion complete!")

0 commit comments

Comments
 (0)