Skip to content

Commit 0b6cf2a

Browse files
committed
Update notes.md
1 parent 51f3b06 commit 0b6cf2a

1 file changed

Lines changed: 71 additions & 4 deletions

File tree

notes.md

Lines changed: 71 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10693,10 +10693,77 @@ Are patient_0 saliency maps identical between heads?: True
1069310693
- Save predictions and top-10 feature importance summaries.
1069410694
- Provide optional interactive CLI for single-patient predictions.
1069510695
- Ensure reproducible, dataset-agnostic outputs without requiring true labels.
10696-
- **Process (summary)**
10697-
- **Outputs**
10698-
- **Reasonings**
10699-
10696+
- **Process (Summary)**
10697+
0. **Initialise Imports & Paths**
10698+
- Load standard, data, ML, and interpretability libraries (`json`, `pandas`, `numpy`, `torch`, `joblib`, `shap`).
10699+
- Resolve project directories (`SCRIPT_DIR`, `SRC_DIR`, `PROJECT_ROOT`).
10700+
- Load input paths:
10701+
- Patient-level features (`news2_features_patient.csv`)
10702+
- Test splits (`patient_splits.json`)
10703+
- TCN: padding config (`padding_config.json`), prepared tensors (`prepared_datasets/`)
10704+
- Load model paths:
10705+
- LightGBM models (`lightgbm_results/`)
10706+
- TCN weights and config (`tcn_best_refined.pt`, `config_refined.json`)
10707+
- Create output folder: `deployment_lite_outputs/`.
10708+
1. **Load Test Data for LightGBM**
10709+
- Load test patient IDs → `test_ids`.
10710+
- Subset patient-level features → `test_df`.
10711+
- Define model input features (`feature_cols`) by excluding non-features (`subject_id`, `max_risk`, `median_risk`, `pct_time_high`).
10712+
- **Rationale:** Only input features needed; binary targets and labels unnecessary for inference.
10713+
2. **Compute LightGBM Inference**
10714+
- Load and run each model (`max_risk`, `median_risk`, `pct_time_high`) on `X_test`.
10715+
- Classification → positive-class probabilities; regression → continuous predictions.
10716+
- Clip regression outputs at 0.
10717+
- Save `lightgbm_inference_outputs.csv`.
10718+
3. **Compute TCN Inference**
10719+
- Load `TCNModel`, test tensors (`x_test`, `mask_test`), and configuration.
10720+
- Reconstruct model architecture and load weights; set to evaluation mode.
10721+
- Run forward pass with `torch.no_grad()` → deterministic outputs.
10722+
- Extract outputs:
10723+
- Convert logits → sigmoid probabilities
10724+
- Inverse-transform regression (`expm1`), clip negatives at 0.
10725+
- Build `df_tcn` and save `tcn_inference_outputs.csv`.
10726+
- **Rationale:** No binary targets needed; architecture reconstruction required for loading weights; masks preserve sequence validity.
10727+
4. **Compute LightGBM Interpretability (SHAP)**
10728+
- Compute mean absolute SHAP values per feature for each target.
10729+
- Keep **top-10 features** per target in dataframe.
10730+
- Save numeric summary → `lightgbm_top10`.
10731+
- **Rationale:** Lightweight, deployment-ready; mirrors Phase 6 top-10 summary; no plots to maintain lightweight outputs.
10732+
5. **Compute TCN Interpretability (Gradient × Input Saliency)**
10733+
- Load feature names from `padding_config.json`, map to TCN tensor features.
10734+
- For each output head (`max_risk`, `median_risk`, `pct_time_high`):
10735+
- Compute |gradient × input| saliency across patients and timesteps.
10736+
- Aggregate to mean per feature (average across patients and timesteps), keep top-10 features.
10737+
- Save numeric summary → `tcn_top10`.
10738+
- **Rationale:** Matches Phase 6 methodology; numeric-only output keeps pipeline lightweight.
10739+
6. **Merge Feature Summaries**
10740+
- Concatenate LightGBM `lightgbm_top10` and TCN `tcn_top10` top-10 summaries → `combined_summary`.
10741+
- Columns: `feature`, `mean_abs_shap`, `target`, `model`, `mean_abs_saliency`
10742+
- Output: `top10_features_summary.csv`
10743+
- Save as `top10_features_summary.csv` (60 rows: 2 models × 3 targets × 10 features).
10744+
- **Rationale:** One consolidated, deployment-ready file; no plots or per-patient arrays; easy dashboard/reporting.
10745+
7. **Interactive CLI: Single-Patient Inference**
10746+
- Optional CLI interface post-batch inference.
10747+
- Input patient ID → validate against `test_ids`.
10748+
- Display that patient’s predictions for LightGBM (`lightgbm_preds`) and TCN (`prob_max`, `prob_median`, `y_pred_reg_raw`).
10749+
- Loop until user exits.
10750+
- **Rationale:** Optional, lightweight, reproducible CLI for quick inspection; uses precomputed outputs; supports deployment without extra artefacts.
10751+
**Outputs**
10752+
- **Batch Predictions**
10753+
- `lightgbm_inference_outputs.csv` → classification probabilities (`max_risk`, `median_risk`) + regression (`pct_time_high`) for all test patients.
10754+
- `tcn_inference_outputs.csv` → probabilities and regression outputs for TCN model.
10755+
- **Interpretability**
10756+
- `top10_features_summary.csv` → combined top-10 features per target from LightGBM (SHAP) and TCN (Gradient×Input Saliency).
10757+
- **Interactive CLI**
10758+
- Optional terminal output for single-patient predictions using the same preprocessed inputs.
10759+
**Reasoning / Rationale**
10760+
- **Reproducibility:** Batch inference ensures deterministic outputs and removes variation from looping or incremental processing.
10761+
- **Unified pipeline:** Consolidates separate evaluation and interpretability scripts into a single workflow for both LightGBM and TCN.
10762+
- **Interpretability tailored to model type:** LightGBM → SHAP; TCN → Gradient×Input Saliency. Only top-10 features retained to keep outputs lightweight, consistent with Phase 6 methodology.
10763+
- **Binary targets omitted:** Inference does not compute metrics; outputs are generated from input features only, without label reconstruction or calibration.
10764+
- **Regression clipping:** Ensures numeric predictions are valid (no negative percentages).
10765+
- **Deployment-ready and dataset-agnostic:** Any dataset with the required feature columns / tensors can be passed directly into the script without outcome labels.
10766+
- **Optional CLI:** Lightweight inspection of single-patient predictions without recomputation; aligns with batch outputs for consistency.
1070010767
### End Products of Phase 7A
1070110768

1070210769
### Summary

0 commit comments

Comments
 (0)