@@ -166,17 +166,23 @@ Chameleon/
166166
167167## 📈 Analysis Output
168168
169- After running ` python cli.py analyze --project YourProject ` , all outputs are saved to ` Projects/YourProject/results/analysis/ ` :
169+ After running ` python cli.py analyze --project YourProject ` , all outputs are saved to ` Projects/YourProject/results/analysis/ ` ( ~ 23 files) :
170170
171- ### 📊 Core Metrics
171+ ### 📊 Core Metrics (Data + Charts)
172172
173173| File | Description | Key Insight |
174174| ------| -------------| -------------|
175- | ` 01_accuracy_by_miu.csv/png ` | Accuracy curve across μ levels | How quickly does accuracy degrade? |
176- | ` 02_accuracy_by_subject_miu.csv ` | Per-subject breakdown | Which subjects are most vulnerable? |
175+ | ` 01_accuracy_by_miu.csv ` | Accuracy data by μ level | Raw numbers for each distortion level |
176+ | ` 01_accuracy_by_miu.png ` | 📈 ** Line chart** : accuracy vs distortion | Visualize degradation curve |
177+ | ` 02_accuracy_by_subject_miu.csv ` | Per-subject accuracy data | Which subjects are most vulnerable? |
178+ | ` 02_subject_ranking.png ` | 📊 ** Bar chart** : subject performance | Rank subjects by baseline accuracy |
179+ | ` 02_subject_miu_heatmap.png ` | 🔥 ** Heatmap** : absolute accuracy (Subject × μ) | See accuracy patterns |
180+ | ` 02_degradation_heatmap.png ` | 🔥 ** Heatmap** : % degradation from baseline | Identify vulnerable subjects |
177181| ` 03_chameleon_robustness_index.csv ` | CRI scores (global + per-subject) | Single metric for model ranking |
178- | ` 04_elasticity.csv/png ` | Linear regression of degradation | Quantify fragility with slope |
179- | ` 05_model_comparison.csv/png ` | Head-to-head comparison table | Compare all metrics in one view |
182+ | ` 04_elasticity.csv ` | Degradation slope data | Quantify fragility numerically |
183+ | ` 04_elasticity.png ` | 📈 ** Scatter + regression** : degradation rate | Visualize slope |
184+ | ` 05_model_comparison.csv ` | Head-to-head comparison table | Compare all metrics |
185+ | ` 05_model_comparison.png ` | 📊 ** Scatter plot** : CRI vs accuracy | Compare models visually |
180186
181187### 🔬 Error Analysis
182188
@@ -190,17 +196,20 @@ After running `python cli.py analyze --project YourProject`, all outputs are sav
190196| File | Description | Key Insight |
191197| ------| -------------| -------------|
192198| ` 08_bootstrap_intervals.csv ` | 95% confidence intervals (500 samples) | Are differences statistically significant? |
193- | ` mcnemar_distortion_results.csv ` | McNemar's test: μ=0 vs μ>0 | Paired significance testing |
194- | ` mcnemar_subject_results.csv ` | Per-subject McNemar tests | Subject-specific significance |
195- | ` mcnemar_pairwise_results.csv ` | Adjacent μ level comparisons | Which μ jumps matter most? |
199+ | ` 11_mcnemar_distortion.csv ` | McNemar's test: μ=0 vs each μ>0 | Paired significance testing |
200+ | ` 11_mcnemar_distortion.png ` | 📊 ** Bar chart** : baseline vs distorted (* = p<0.05) | Visualize significant differences |
201+ | ` 12_mcnemar_subject.csv ` | Per-subject McNemar tests | Subject-specific significance |
202+ | ` 12_mcnemar_subject.png ` | 📊 ** Bar chart** : per-subject significance | Which subjects show real degradation? |
196203
197204### 🎯 Advanced Analysis
198205
199206| File | Description | Key Insight |
200207| ------| -------------| -------------|
201- | ` 09_delta_accuracy_heatmap.csv/png ` | Subject × μ degradation matrix | Visual: Red = high degradation |
208+ | ` 09_delta_accuracy_heatmap.csv ` | Subject × μ degradation matrix (data) | Raw delta values |
209+ | ` 09_delta_accuracy_heatmap.png ` | 🔥 ** Heatmap** : change from baseline | Visual: Red = high degradation |
202210| ` 10_question_difficulty_tiers.json ` | Easy/Medium/Hard/Chameleon Breakers | Find pattern-matching evidence |
203- | ` 11_executive_summary.md ` | ** START HERE** - Full findings report | Comprehensive interpretation |
211+ | ` 13_key_insights.png ` | 📊 ** 4-panel summary** : curve + bars + pie + stats | Quick visual overview |
212+ | ` EXECUTIVE_REPORT.md ` | 📄 ** START HERE** - Full findings report | Comprehensive interpretation |
204213
205214---
206215
@@ -233,6 +242,26 @@ Linear regression of accuracy vs μ:
233242
234243## 🐳 Docker Usage
235244
245+ ### Option 1: Docker Compose (Recommended)
246+
247+ ``` bash
248+ # Set your API keys in .env or export them
249+ export MISTRAL_API_KEY=" your-mistral-key"
250+ export OPENAI_API_KEY=" your-openai-key"
251+
252+ # Build and run
253+ docker-compose build
254+ docker-compose run chameleon python cli.py init
255+ docker-compose run chameleon python cli.py distort -p MyProject
256+ docker-compose run chameleon python cli.py evaluate -p MyProject
257+ docker-compose run chameleon python cli.py analyze -p MyProject
258+
259+ # Or run analysis only (no API keys needed)
260+ PROJECT=MyProject docker-compose run analyze
261+ ```
262+
263+ ### Option 2: Docker Direct
264+
236265``` bash
237266# Build
238267docker build -t chameleon .
0 commit comments