189 clinical interviews (sessions 300-492)
- Labels: PHQ-8 depression scores (0=healthy, 1=depressed if score β₯10)
- Data: Audio transcripts + acoustic features
- Splits: Official train/dev/test
http://dcapswoz.ict.usc.edu/wwwdaicwoz/train_split_Depression_AVEC2017.csv
http://dcapswoz.ict.usc.edu/wwwdaicwoz/dev_split_Depression_AVEC2017.csv
http://dcapswoz.ict.usc.edu/wwwdaicwoz/test_split_Depression_AVEC2017.csv
Save to: data/splits/
Download 10-20 training sessions first (~5-8 GB):
http://dcapswoz.ict.usc.edu/wwwdaicwoz/300_P.zip (327M)
http://dcapswoz.ict.usc.edu/wwwdaicwoz/301_P.zip (403M)
...
Extract to: data/raw/300_P/, data/raw/301_P/, etc.
data/
βββ splits/
β βββ train_split_Depression_AVEC2017.csv β IDs + labels
β βββ dev_split_Depression_AVEC2017.csv
β βββ test_split_Depression_AVEC2017.csv
β
βββ raw/
βββ 300_P/
β βββ 300_TRANSCRIPT.csv β Use this (text)
β βββ 300_COVAREP.csv β Use this (74 acoustic features)
βββ 301_P/
βββ ...
For your unsupervised learning project:
- Text:
XXX_TRANSCRIPT.csv- Interview transcripts - Audio:
XXX_COVAREP.csv- 74 acoustic features (F0, MFCC, jitter, etc.)
Ignore video/facial files for now (optional later).
# 1. Create folders
cd "m:\5th sem\ML2-project"
New-Item -ItemType Directory -Path "data\splits", "data\raw" -Force
# 2. Download CSV files manually to data/splits/
# 3. Download 10 sessions
# Download 300_P.zip through 309_P.zip from URL above
# Extract each to data/raw/
# 4. Run notebook
jupyter notebook notebooks/03_DAICWOZ_analysis.ipynbtrain_split_Depression_AVEC2017.csv:
Participant_ID,PHQ8_Binary,PHQ8_Score,Gender
300,0,3,Male
301,1,15,Female
...
- PHQ8_Binary: 0=No depression, 1=Depression
- PHQ8_Score: 0-24 (β₯10 = depression threshold)
- Minimal (10 sessions): ~4 GB
- Training set only: ~50 GB
- Full dataset: ~85 GB
Start with 10-20 sessions, then download more as needed.