Skip to content

Commit fa8ae65

Browse files
committed
High level README.md for analysis scripts
1 parent ebdd413 commit fa8ae65

File tree

1 file changed

+235
-0
lines changed

1 file changed

+235
-0
lines changed

scripts/analysis/README.md

Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
# Analysis Scripts
2+
3+
This directory contains scripts for analysing and aggregating results across multiple experiments. The scripts operate on the outputs from individual experiments and provide higher-level insights into model performance, sampling strategy effectiveness, and cross-experiment comparisons
4+
5+
## Scripts Description
6+
7+
### `aggregate_analysis.py`
8+
Aggregates results across multiple experiments to determine which sampling strategies perform better than random baseline. Computes percentage improvements and generates LaTeX tables.
9+
10+
**Usage:**
11+
```bash
12+
python scripts/analysis/aggregate_analysis.py \
13+
outputs/reddit_dataset_12/one-vs-all/football/42_05 \
14+
--models distilbert gpt2 ModernBERT \
15+
--imbalances 05 01 001 \
16+
--strategies accuracy_lightgbm info_gain_lightgbm minority ssepy isolation \
17+
--metrics accuracy f1_score precision recall
18+
```
19+
20+
**Parameters:**
21+
- `base_path` (required): Base experiment directory path
22+
- `--models, -m` (required): List of model names to analyse
23+
- `--imbalances, -i` (required): Test imbalance levels (e.g., '05', '01', '001')
24+
- `--strategies, -s` (required): Sampling strategies to evaluate
25+
- `--metrics, -mt` (optional): Metrics to compare (defaults to all available metrics from random baseline)
26+
27+
**Description:**
28+
- Iterates over combinations of models, imbalances, and strategies
29+
- Generates LaTeX tables in `tmp/tables/aggregate_performance/`
30+
- Computes average improvements across models for each configuration
31+
- Uses sample sizes: 110, 260, 510, 1000
32+
33+
### `compare_models.py`
34+
Compares performance of different models using the same sampling strategy and imbalance level. Creates detailed visualizations comparing model performance.
35+
36+
**Usage:**
37+
```bash
38+
python scripts/analysis/compare_models.py \
39+
outputs/reddit_dataset_12/one-vs-all/football/42_05 \
40+
05 \
41+
accuracy_lightgbm \
42+
--models distilbert gpt2 ModernBERT zero-shot
43+
```
44+
45+
**Parameters:**
46+
- `base_path` (required): Base path like 'outputs/reddit_dataset_12/one-vs-all/football/42_05'
47+
- `test_imbalance` (required): Test imbalance level (e.g., '05', '01', '001')
48+
- `sampling_method` (required): Sampling method directory name (e.g., 'random', 'distance')
49+
- `--models` (required): List of model names to compare
50+
51+
**Description:**
52+
- Generates model comparison plots (raw metrics and MSE)
53+
- Creates visualizations in organized directory structure at `{base_path}/figures/model_comparison/`
54+
- Produces both raw performance curves and MSE comparison plots
55+
- Automatically finds model directories matching the specified pattern
56+
- Creates separate subdirectories for different plot types (raw/, mse/)
57+
58+
### `compare_sampling_strategies.py`
59+
Compares different sampling strategies for a given model and imbalance level. Generates comprehensive visualizations comparing strategy effectiveness.
60+
61+
**Usage:**
62+
```bash
63+
python scripts/analysis/compare_sampling_strategies.py \
64+
outputs/reddit_dataset_12/one-vs-all/football/42_05 \
65+
distilbert \
66+
--imbalances 05 01 001 \
67+
--strategies random distance isolation accuracy_lightgbm info_gain_lightgbm
68+
```
69+
70+
**Parameters:**
71+
- `base_path` (required): Base path like 'outputs/reddit_dataset_12/one-vs-all/football/42_05'
72+
- `model` (required): Model name (e.g., 'distilbert', 'gpt2', 'ModernBERT')
73+
- `--imbalances` (required): Test imbalance levels to process (e.g., '05', '01', '001')
74+
- `--strategies` (required): List of sampling strategies to compare
75+
76+
**Description:**
77+
- Comparison of sampling strategies
78+
- Generates three types of plots: raw metrics, MSE comparison, and improvement analysis
79+
- Creates organized output directory structure at `{base_path}/figures/sampling_comparison/`
80+
- Processes multiple imbalance levels
81+
82+
### `model_performance.py`
83+
analyses overall model performance across different configurations.
84+
85+
**Description:**
86+
- Comprehensive performance metrics analysis
87+
- Cross-configuration performance comparison
88+
- Statistical summaries and visualizations
89+
90+
### `model_performance_summary.py`
91+
Generates summary reports of model performance across all experiments.
92+
93+
**Description:**
94+
- High-level performance summaries
95+
- LaTeX table generation for reports
96+
- Consolidated performance metrics
97+
98+
### `model_transfer_analysis.py`
99+
analyses how models perform when transferred across different imbalance levels or datasets.
100+
101+
**Description:**
102+
- Label transfer performance analysis
103+
- Produces a number of LaTeX tables
104+
105+
### `bootstrapping_rmse.py`
106+
Performs bootstrap resampling analysis to assess confidence intervals for RMSE and other metrics. Provides robust statistical analysis of sampling strategy performance.
107+
108+
**Usage:**
109+
```bash
110+
python scripts/analysis/bootstrapping_rmse.py \
111+
outputs/reddit_dataset_12/one-vs-all/football/42_05 \
112+
--models distilbert gpt2 ModernBERT zero-shot \
113+
--imbalances 05 01 001 \
114+
--sampling-methods random ssepy info_gain_lightgbm accuracy_lightgbm minority isolation \
115+
--verbose \
116+
--selected-steps 10 100 500
117+
```
118+
119+
**Parameters:**
120+
- `base_path` (required): Base experiment directory path
121+
- `--models` (optional): Model names to include (default: distilbert, ModernBERT, gpt2, zero-shot)
122+
- `--imbalances` (optional): Test imbalance levels (default: 05, 01, 001)
123+
- `--sampling-methods` (optional): Sampling strategies to evaluate (default: random, ssepy, info_gain_lightgbm, accuracy_lightgbm, minority, isolation)
124+
- `--verbose, -v` (optional): Print detailed output and tables
125+
- `--selected-steps` (optional): Only include specific evaluation steps (e.g., 10 100 500)
126+
127+
**Description:**
128+
- Bootstrap confidence interval estimation with configurable sample sizes
129+
- Statistical significance testing across all metrics
130+
- Error estimation using bootstrap resampling
131+
- Generates grouped analysis tables and JSON results
132+
- Can filter to specific evaluation steps for focused analysis
133+
- Creates tables in multiple formats: raw RMSE, differences from random, and grouped metrics
134+
- Outputs saved to `{base_path}/tables/bootstrap_rmse/` with subdirectories for different analysis types
135+
136+
### `bias_correction_impact.py`
137+
analyses the impact of bias correction techniques on sampling strategy performance.
138+
139+
**Description:**
140+
- Before/after bias correction comparisons
141+
- Impact quantification across strategies
142+
143+
### `aggregate_class_distribution.py`
144+
Analyses class distribution patterns across different sampling strategies and datasets from metrics files.
145+
146+
**Usage:**
147+
```bash
148+
python scripts/analysis/aggregate_class_distribution.py \
149+
outputs/reddit_dataset_12/one-vs-all/football/42_05 \
150+
--imbalances 01 05 001 \
151+
--models distilbert ModernBERT gpt2 zero-shot \
152+
--sampling-methods random ssepy minority info_gain_lightgbm accuracy_lightgbm isolation \
153+
--sample-sizes 10 100 1000 \
154+
--max-runs 5
155+
```
156+
157+
**Parameters:**
158+
- `base_path` (required): Base experiment directory path
159+
- `--imbalances` (required): Imbalance levels (e.g., '01', '05')
160+
- `--models` (optional): Models to analyse (default: distilbert, ModernBERT, gpt2, zero-shot)
161+
- `--sampling-methods` (optional): Sampling methods to analyse (default: random, ssepy, minority, info_gain_lightgbm, accuracy_lightgbm, isolation)
162+
- `--sample-sizes` (optional): Specific sample sizes to analyse (e.g., 10 100 1000)
163+
- `--max-runs` (optional): Maximum number of runs per sampling method (None = use all available)
164+
165+
**Description:**
166+
- Class imbalance analysis across different sample sizes
167+
- Sampling strategy bias assessment with statistical summaries
168+
- Distribution visualization and percentage tables
169+
- Outputs summary CSV files and LaTeX tables to `{base_path}/tables/class_distributions/`
170+
- Creates visualizations in `{base_path}/figures/class_distributions/`
171+
172+
### `aggregate_replay.py`
173+
Aggregates replay experiments across multiple models and imbalance levels. This script has hardcoded configuration and analyses label transfer performance.
174+
175+
**Usage:**
176+
```bash
177+
python scripts/analysis/aggregate_replay.py
178+
```
179+
180+
**Parameters:**
181+
- No command-line arguments (uses hardcoded configuration)
182+
- Hardcoded paths: `outputs/reddit_dataset_12/one-vs-all/football/42_05`
183+
- Hardcoded models: distilbert, gpt2, ModernBERT, zero-shot
184+
- Hardcoded imbalances: 05, 01, 001
185+
- Hardcoded samplers: accuracy_lightgbm, info_gain_lightgbm, minority, ssepy
186+
187+
**Description:**
188+
- Replay experiment aggregation with bootstrap RMSE analysis
189+
- Cross-model strategy consistency analysis
190+
- Performance stability assessment across configurations
191+
- Generates tables and figures in `outputs/.../tables/aggregate_replay/` and `outputs/.../figures/aggregate_replay/`
192+
193+
### `analyse_replay.py`
194+
Analyses individual replay transfer results for a specific source model.
195+
196+
**Usage:**
197+
```bash
198+
python scripts/analysis/analyse_replay.py \
199+
/path/to/source/model/results/directory
200+
```
201+
202+
**Parameters:**
203+
- `source_model_dir` (required): Path to the source model's results directory
204+
205+
**Description:**
206+
- Individual replay experiment analysis
207+
- Generates plots comparing replay results across different target configurations
208+
- Uses predefined configs: distilbert_default, gpt2_default, ModernBERT_short, zero-shot_default
209+
210+
## Data Requirements
211+
212+
These scripts expect data in the following structure:
213+
```
214+
outputs/
215+
├── reddit_dataset_12/
216+
│ └── one-vs-all/
217+
│ └── football/
218+
│ └── 42_05/
219+
│ ├── distilbert/
220+
│ │ └── default/
221+
│ │ └── eval_outputs/
222+
│ │ ├── 05/
223+
│ │ ├── 01/
224+
│ │ └── 001/
225+
│ │ ├── random/
226+
│ │ ├── accuracy_lightgbm/
227+
│ │ └── ...
228+
│ ├── gpt2/
229+
│ └── ModernBERT/
230+
```
231+
232+
Each strategy directory should contain:
233+
- `stats_full.json`: Complete statistics for all samples
234+
- `sample_*.json`: Sample selection files for different seeds
235+
- `metrics_*.json`: Computed metrics for different sample sizes

0 commit comments

Comments
 (0)