Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
b19fd9e
bring back cass specifics
rvandewater Oct 30, 2024
943d32b
sort indicators
rvandewater Oct 30, 2024
1aea731
dependencies update
rvandewater Oct 30, 2024
57815a5
alarm toy
rvandewater Nov 15, 2024
92ed52d
preprocessing exclusion added
rvandewater Dec 2, 2024
e16cdb7
hpc
rvandewater Dec 20, 2024
60fb351
specifying 0 hyperparameter calls
rvandewater Dec 20, 2024
64b38ed
logging
rvandewater Dec 20, 2024
d943d03
add shap and update optuna
rvandewater Dec 20, 2024
ff0358b
modality mapping failsafe
rvandewater Dec 20, 2024
d894332
move to utils
rvandewater Dec 20, 2024
ab313fb
explain features
rvandewater Dec 20, 2024
0d5fa05
explain features and tuning folds mismatch
rvandewater Dec 20, 2024
baaba5b
remove grouping value from features
rvandewater Dec 20, 2024
0d25e43
explanation features
rvandewater Jan 20, 2025
7db89ef
Merge branch 'development' into cass
rvandewater Jan 21, 2025
af984aa
Support for static prediction
rvandewater Feb 26, 2025
179c098
hyperparameter checkpoint finding fix
rvandewater Feb 26, 2025
0afa916
Merge branch 'development' into cass
rvandewater Feb 26, 2025
ee186e8
failsafe for computing scores
rvandewater Apr 4, 2025
63ea7be
explainer corrections
rvandewater Apr 4, 2025
0ecdb59
modality exclusions and explainability
rvandewater Apr 4, 2025
320007a
configs
rvandewater Apr 4, 2025
22c8741
Merge branch 'development' into cass
rvandewater Apr 4, 2025
de09ec8
return vars
rvandewater Apr 4, 2025
31ff2e6
linting
rvandewater Apr 4, 2025
cf29a5a
file_names option
rvandewater Apr 15, 2025
4671626
cleanup and loader fix
rvandewater Apr 15, 2025
2bb47ae
fix xgboost hyperparam binding
rvandewater Apr 16, 2025
d585996
fix xgboost unsupported argument
rvandewater Apr 16, 2025
623ea09
fix xgboost hyperparam binding
rvandewater May 14, 2025
b01cf29
outcome time experiment
rvandewater May 14, 2025
381a968
delete ids (for explainer)
rvandewater Jun 4, 2025
118e1cf
xgb gpu (not fully working yet)
rvandewater Jun 4, 2025
042ceb7
utils
rvandewater Jun 4, 2025
2e419aa
explainer fixes
rvandewater Jun 4, 2025
1270d5e
utils fix
rvandewater Jun 4, 2025
4194538
copy checkpoint
rvandewater Jun 4, 2025
f1e7973
matthews correlation (not implemented yet; thresholding needed)
rvandewater Jun 4, 2025
c1385f9
experiments
rvandewater Jun 4, 2025
02fe208
Merge branch 'development' into cass
rvandewater Jun 4, 2025
9d6ed4c
experiments
rvandewater Jun 4, 2025
db347fe
Merge branch 'development' into cass
rvandewater Jun 4, 2025
5efa7d7
ruff
rvandewater Jun 4, 2025
ff4b47d
ruff
rvandewater Jun 4, 2025
9a780af
Merge branch 'development' into cass
rvandewater Jun 4, 2025
2126046
fixes
rvandewater Jun 4, 2025
cb25cfb
fixes
rvandewater Jun 4, 2025
ec6bc0d
fixes
rvandewater Jun 4, 2025
cbe9a64
preprocessor type hint
rvandewater Jun 4, 2025
ff7dcec
restore exclusion
rvandewater Jun 4, 2025
fba0bce
add explain_features flag
rvandewater Jun 10, 2025
c58574f
explain features
rvandewater Jun 11, 2025
22587e7
explicit conversion to amount of hours for row indicators
rvandewater Jun 11, 2025
347b453
ruff and incidence computation
rvandewater Jun 13, 2025
18165c2
update experiments
rvandewater Jun 18, 2025
3ebdaed
Merge branch 'development' into cass
rvandewater Jun 18, 2025
80f33cb
checkpoint dir failsafe
rvandewater Jun 23, 2025
15e37ea
add option to append data vars automatically
rvandewater Jun 23, 2025
08ff68f
logging curves
rvandewater Jun 23, 2025
754c441
make logging more informative/cleaner
rvandewater Jun 23, 2025
9ce70b0
clean logging
rvandewater Jun 23, 2025
05eb6dc
logging metrics to file for curves
rvandewater Jun 23, 2025
ae77b6d
ruff
rvandewater Jun 23, 2025
60081ed
no alarm toy notebook
rvandewater Jun 23, 2025
0aa66c7
utils refactor
rvandewater Jun 23, 2025
4a0f725
data dir check
rvandewater Jul 10, 2025
2546b27
allow preprocessing to pass if excluding modalities and failsafe for …
rvandewater Jul 10, 2025
95ff49d
adding website
rvandewater Jul 15, 2025
de49bc6
docs
rvandewater Jul 15, 2025
5b71c64
fixes for just using static data. Added repetitions to tune on.
rvandewater Jul 15, 2025
187cb2c
add append predictions
rvandewater Aug 1, 2025
aac4d18
logging run dir
rvandewater Aug 13, 2025
5e93636
added preds for neural networks
rvandewater Aug 13, 2025
7559c5f
Add more hyperparameter ranges lgbm and remove callback that crashed run
rvandewater Aug 25, 2025
02889e2
sphinx config
rvandewater Aug 25, 2025
d01674e
allow val change
rvandewater Aug 25, 2025
1a564e4
experiment with imblearn ensemble
rvandewater Aug 25, 2025
8af76b1
more time for running experiments
rvandewater Aug 25, 2025
9a2aa2a
persist all shaps, reps, and labels for shap value/heatmap plots
rvandewater Aug 25, 2025
2632228
reduce complexity for shaps
rvandewater Aug 27, 2025
b9443de
Added calibaration capabilities
rvandewater Sep 2, 2025
52895ac
new experiments
rvandewater Sep 2, 2025
ead4582
cleanup
rvandewater Sep 2, 2025
ba742cd
init
rvandewater Sep 2, 2025
cd25d4e
adding functionality to reduce stay steps
rvandewater Sep 8, 2025
235c02f
caliration update
rvandewater Sep 8, 2025
7701b00
reduce stay steps option
rvandewater Oct 20, 2025
88b63e7
lgbm explainer
rvandewater Oct 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 0 additions & 33 deletions .github/workflows/ci.yml

This file was deleted.

57 changes: 50 additions & 7 deletions configs/prediction_models/LGBMClassifier.gin
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,54 @@ include "configs/prediction_models/common/MLCommon.gin"
# Train params
train_common.model = @LGBMClassifier

# Hyperparameter tuning configuration
model/hyperparameter.class_to_tune = @LGBMClassifier
model/hyperparameter.colsample_bytree = (0.33, 1.0)
model/hyperparameter.max_depth = (3, 7)
model/hyperparameter.min_child_samples = 1000
model/hyperparameter.n_estimators = 100000
model/hyperparameter.num_leaves = (8, 128, "log", 2)
model/hyperparameter.subsample = (0.33, 1.0)
model/hyperparameter.subsample_freq = 1

# Core tree parameters
model/hyperparameter.n_estimators = (500, 2000, 5000, 10000, 100000)
model/hyperparameter.max_depth = (3, 5, 7, 10, 15)
model/hyperparameter.num_leaves = (8, 16, 31, 64, 128, 256, "log", 2)
model/hyperparameter.min_child_samples = (10, 20, 50, 100, 500, 1000)
model/hyperparameter.min_child_weight = (1e-3, 10.0, "log")

Comment on lines +13 to +18
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Fix malformed num_leaves spec and consider simplifying n_estimators.

  • The num_leaves tuple mixes discrete values with "log", 2. This is likely to break your parser (or yield invalid values). Prefer either a discrete list or a proper range tuple.
  • Optional: Given early stopping is enabled below, a very large discrete set for n_estimators adds search noise/cost. A log-range or a single high ceiling with early stopping is more efficient.

Apply this diff to make both entries unambiguous:

-model/hyperparameter.n_estimators = (500, 2000, 5000, 10000, 100000)
+model/hyperparameter.n_estimators = (500, 20000, "log")
@@
-model/hyperparameter.num_leaves = (8, 16, 31, 64, 128, 256, "log", 2)
+model/hyperparameter.num_leaves = (8, 256, "log")

Notes:

  • If you prefer discrete leaves, use e.g. [15, 31, 63, 127, 255] instead.
  • Consider adding max_depth = [-1, 3, 5, 7, 10, 15] so the unconstrained depth case is explored.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
model/hyperparameter.n_estimators = (500, 2000, 5000, 10000, 100000)
model/hyperparameter.max_depth = (3, 5, 7, 10, 15)
model/hyperparameter.num_leaves = (8, 16, 31, 64, 128, 256, "log", 2)
model/hyperparameter.min_child_samples = (10, 20, 50, 100, 500, 1000)
model/hyperparameter.min_child_weight = (1e-3, 10.0, "log")
# configs/prediction_models/LGBMClassifier.gin
model/hyperparameter.n_estimators = (500, 20000, "log")
model/hyperparameter.max_depth = (3, 5, 7, 10, 15)
model/hyperparameter.num_leaves = (8, 256, "log")
model/hyperparameter.min_child_samples = (10, 20, 50, 100, 500, 1000)
model/hyperparameter.min_child_weight = (1e-3, 10.0, "log")
🤖 Prompt for AI Agents
In configs/prediction_models/LGBMClassifier.gin around lines 13 to 18, the
num_leaves hyperparameter is malformed (mixes discrete values with `"log", 2`)
and n_estimators is an overly large discrete set; fix num_leaves to be a clear
discrete list (e.g. 15, 31, 63, 127, 255) or a valid range spec so the parser
won’t break, and simplify n_estimators to either a log-range (e.g. 100, 500,
1000, 5000) or a single high ceiling relying on early stopping to control
training; optionally add max_depth = (-1, 3, 5, 7, 10, 15) so the unconstrained
depth case is included.

# Learning rate and regularization
model/hyperparameter.learning_rate = (0.01, 0.3, "log")
model/hyperparameter.reg_alpha = (1e-6, 1.0, "log")
model/hyperparameter.reg_lambda = (1e-6, 1.0, "log")

# Sampling parameters
model/hyperparameter.subsample = (0.4, 1.0)
model/hyperparameter.subsample_freq = (0, 1, 5, 10)
model/hyperparameter.colsample_bytree = (0.4, 1.0)
model/hyperparameter.colsample_bynode = (0.4, 1.0)

Comment on lines +24 to +29
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Remove LightGBM/XGBoost alias duplication and an invalid parameter name.

  • subsample/subsample_freq are LightGBM aliases of bagging_fraction/bagging_freq. Keeping both creates conflicting grids and harder-to-debug overrides.
  • colsample_bytree is an alias of feature_fraction in LightGBM; again, duplicate knobs.
  • colsample_bynode is not a LightGBM parameter (XGBoost-ism). If you want per-node feature sampling, use feature_fraction_bynode.

Apply this diff to remove the aliases here and manage sampling via the canonical keys in the Performance block (Lines 42-47):

-# Sampling parameters
-model/hyperparameter.subsample = (0.4, 1.0)
-model/hyperparameter.subsample_freq = (0, 1, 5, 10)
-model/hyperparameter.colsample_bytree = (0.4, 1.0)
-model/hyperparameter.colsample_bynode = (0.4, 1.0)

Then, add the per-node variant where the canonical feature/bagging params are declared (see the suggested change on Lines 42-47).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Sampling parameters
model/hyperparameter.subsample = (0.4, 1.0)
model/hyperparameter.subsample_freq = (0, 1, 5, 10)
model/hyperparameter.colsample_bytree = (0.4, 1.0)
model/hyperparameter.colsample_bynode = (0.4, 1.0)
🤖 Prompt for AI Agents
In configs/prediction_models/LGBMClassifier.gin around lines 24 to 29, remove
the LightGBM/XGBoost alias and invalid parameters — delete
model/hyperparameter.subsample, model/hyperparameter.subsample_freq,
model/hyperparameter.colsample_bytree and model/hyperparameter.colsample_bynode
— and instead manage sampling via the canonical LightGBM keys in the Performance
block (lines 42-47); then add the per-node variant
model/hyperparameter.feature_fraction_bynode alongside feature_fraction and
bagging_fraction in that Performance block so per-node sampling is available
under the canonical LightGBM name.

# Boosting parameters
model/hyperparameter.boosting_type = ["gbdt", "dart"]

# Advanced DART parameters (active when boosting_type="dart")
model/hyperparameter.drop_rate = (0.1, 0.5)
model/hyperparameter.max_drop = (10, 50)
model/hyperparameter.skip_drop = (0.1, 0.9)

# GOSS parameters (active when boosting_type="goss")
model/hyperparameter.top_rate = (0.1, 0.5)
model/hyperparameter.other_rate = (0.05, 0.2)

Comment on lines +30 to +41
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

GOSS parameters present without enabling GOSS; make the search space consistent.

You’re tuning top_rate/other_rate for GOSS but boosting_type doesn’t include "goss". Either add "goss" to boosting_type or drop those knobs to avoid dead/ignored params.

Apply one of the following:

Option A — enable GOSS:

-model/hyperparameter.boosting_type = ["gbdt", "dart"]
+model/hyperparameter.boosting_type = ["gbdt", "dart", "goss"]

Option B — keep only GBDT/DART (remove GOSS-only params):

-# GOSS parameters (active when boosting_type="goss")
-model/hyperparameter.top_rate = (0.1, 0.5)
-model/hyperparameter.other_rate = (0.05, 0.2)

Note: When using GOSS, ensure top_rate + other_rate < 1. If your tuner can’t enforce relational constraints, narrow ranges to, e.g., top_rate=(0.1,0.3) and other_rate=(0.05,0.15).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Boosting parameters
model/hyperparameter.boosting_type = ["gbdt", "dart"]
# Advanced DART parameters (active when boosting_type="dart")
model/hyperparameter.drop_rate = (0.1, 0.5)
model/hyperparameter.max_drop = (10, 50)
model/hyperparameter.skip_drop = (0.1, 0.9)
# GOSS parameters (active when boosting_type="goss")
model/hyperparameter.top_rate = (0.1, 0.5)
model/hyperparameter.other_rate = (0.05, 0.2)
# Boosting parameters
model/hyperparameter.boosting_type = ["gbdt", "dart", "goss"]
# Advanced DART parameters (active when boosting_type="dart")
model/hyperparameter.drop_rate = (0.1, 0.5)
model/hyperparameter.max_drop = (10, 50)
model/hyperparameter.skip_drop = (0.1, 0.9)
# GOSS parameters (active when boosting_type="goss")
model/hyperparameter.top_rate = (0.1, 0.5)
model/hyperparameter.other_rate = (0.05, 0.2)
Suggested change
# Boosting parameters
model/hyperparameter.boosting_type = ["gbdt", "dart"]
# Advanced DART parameters (active when boosting_type="dart")
model/hyperparameter.drop_rate = (0.1, 0.5)
model/hyperparameter.max_drop = (10, 50)
model/hyperparameter.skip_drop = (0.1, 0.9)
# GOSS parameters (active when boosting_type="goss")
model/hyperparameter.top_rate = (0.1, 0.5)
model/hyperparameter.other_rate = (0.05, 0.2)
# Boosting parameters
model/hyperparameter.boosting_type = ["gbdt", "dart"]
# Advanced DART parameters (active when boosting_type="dart")
model/hyperparameter.drop_rate = (0.1, 0.5)
model/hyperparameter.max_drop = (10, 50)
model/hyperparameter.skip_drop = (0.1, 0.9)
🤖 Prompt for AI Agents
In configs/prediction_models/LGBMClassifier.gin around lines 30 to 41, the file
defines GOSS-specific hyperparameters (top_rate, other_rate) but the
boosting_type list does not include "goss", so those knobs will be ignored; fix
by either (A) add "goss" to model/hyperparameter.boosting_type so the GOSS
params become active and, if enabling GOSS, ensure top_rate + other_rate < 1 (or
tighten ranges, e.g., top_rate=(0.1,0.3) and other_rate=(0.05,0.15) if the tuner
can’t enforce relational constraints), or (B) remove the GOSS-only params
(top_rate and other_rate) to keep the search space consistent with only "gbdt"
and "dart".

# Performance and stability
model/hyperparameter.feature_fraction = (0.4, 1.0)
model/hyperparameter.bagging_fraction = (0.4, 1.0)
model/hyperparameter.bagging_freq = (0, 1, 5, 10)
model/hyperparameter.min_split_gain = (1e-6, 1.0, "log")

Comment on lines +42 to +47
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consolidate sampling on canonical LightGBM params and add by-node feature sampling.

Given aliases were removed above, keep tuning here via canonical names. If you want per-node feature sampling, add feature_fraction_bynode.

Apply this diff to extend the canonical block:

 model/hyperparameter.feature_fraction = (0.4, 1.0)
 model/hyperparameter.bagging_fraction = (0.4, 1.0)
 model/hyperparameter.bagging_freq = (0, 1, 5, 10)
 model/hyperparameter.min_split_gain = (1e-6, 1.0, "log")
+model/hyperparameter.feature_fraction_bynode = (0.4, 1.0)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Performance and stability
model/hyperparameter.feature_fraction = (0.4, 1.0)
model/hyperparameter.bagging_fraction = (0.4, 1.0)
model/hyperparameter.bagging_freq = (0, 1, 5, 10)
model/hyperparameter.min_split_gain = (1e-6, 1.0, "log")
# Performance and stability
model/hyperparameter.feature_fraction = (0.4, 1.0)
model/hyperparameter.bagging_fraction = (0.4, 1.0)
model/hyperparameter.bagging_freq = (0, 1, 5, 10)
model/hyperparameter.min_split_gain = (1e-6, 1.0, "log")
model/hyperparameter.feature_fraction_bynode = (0.4, 1.0)
🤖 Prompt for AI Agents
In configs/prediction_models/LGBMClassifier.gin around lines 42 to 47, the
tuning block uses LightGBM aliases but should use canonical parameter names and
include per-node feature sampling; replace/extend the current entries so they
use canonical names (feature_fraction, bagging_fraction, bagging_freq,
min_split_gain) and add a new feature_fraction_bynode hyperparameter with an
appropriate search range (e.g., (0.4, 1.0)) to enable by-node feature sampling;
ensure the value formats match the file's existing hyperparameter tuple
conventions.

# Categorical handling
model/hyperparameter.cat_smooth = (1.0, 100.0, "log")
model/hyperparameter.cat_l2 = (1.0, 100.0, "log")

# Early stopping and validation
model/hyperparameter.early_stopping_rounds = 100
model/hyperparameter.eval_metric = ["binary_logloss", "auc", "binary_error"]

Comment on lines +52 to +55
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Confirm fit-time plumbing for early stopping and evaluation metric.

early_stopping_rounds and eval_metric are fit-time arguments in the LGBMClassifier sklearn API. If your trainer only instantiates the estimator and never passes these to fit(), these settings won’t take effect. Please confirm they’re forwarded at training time.

Run the following script to verify that your training pipeline passes these through to LightGBM’s fit() (or wraps them in callbacks):

Optional: For stability, consider dropping binary_error from the metric grid for early-stopping-driven tuning and rely on auc or binary_logloss.

Apply if desired:

-model/hyperparameter.eval_metric = ["binary_logloss", "auc", "binary_error"]
+model/hyperparameter.eval_metric = ["binary_logloss", "auc"]

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect how tuning params are materialized and passed to LGBM
rg -n -C3 -e 'early_stopping_rounds|eval_metric|class_to_tune|LGBMClassifier|fit\('

# Common places:
# - Tuning utilities that map gin "model/hyperparameter.*" into estimator kwargs
# - Trainer code that calls estimator.fit(..., eval_set=..., eval_metric=..., early_stopping_rounds=...)

Length of output: 45667


🏁 Script executed:

#!/bin/bash
# Display the beginning of the LGBM wrapper to inspect how fit() is implemented
sed -n '1,200p' icu_benchmarks/models/ml_models/lgbm.py

Length of output: 1753


🏁 Script executed:

#!/bin/bash
# Locate the MLWrapper class to inspect how hyperparameters are mapped
rg -C5 -n 'class MLWrapper' icu_benchmarks/models/wrappers.py

Length of output: 485


🏁 Script executed:

#!/bin/bash
# Inspect how model hyperparameters (including eval_metric, early_stopping_rounds) are applied during model instantiation
rg -C5 -n 'def set_model_args' icu_benchmarks/models/wrappers.py

Length of output: 654


🏁 Script executed:

#!/bin/bash
python3 - << 'EOF'
import lightgbm as lgb
import inspect
print(inspect.signature(lgb.LGBMClassifier.__init__))
EOF

Length of output: 132


🏁 Script executed:

#!/bin/bash
sed -n '630,730p' icu_benchmarks/models/wrappers.py

Length of output: 4091


Update LGBMWrapper.fit_model to wire through Gin-configured early stopping and eval metrics

Your current implementation in icu_benchmarks/models/ml_models/lgbm.py hard-codes:

  • an early-stopping callback using self.hparams.patience (not your Gin model/hyperparameter.early_stopping_rounds)
  • a generic log_evaluation callback (ignoring your Gin model/hyperparameter.eval_metric)

As a result, neither early_stopping_rounds nor eval_metric from your Gin config ever reach the LightGBM fit() call. You’ll need to refactor the fit_model method to read from self.hparams and pass those values into the model, for example:

--- a/icu_benchmarks/models/ml_models/lgbm.py
+++ b/icu_benchmarks/models/ml_models/lgbm.py
@@ class LGBMWrapper(MLWrapper):
-        # Early stopping and logging (hard-coded)
-        callbacks = [
-            lgbm.early_stopping(self.hparams.patience, verbose=True),
-            lgbm.log_evaluation(period=-1)
-        ]
-
-        self.model = self.model.fit(
-            train_data,
-            train_labels,
-            eval_set=(val_data, val_labels),
-            callbacks=callbacks,
-        )
+        # Wire through Gin hyperparameters for early stopping and evaluation metrics
+        callbacks = []
+        # Use Gin’s early_stopping_rounds if set
+        if getattr(self.hparams, "early_stopping_rounds", None):
+            callbacks.append(lgbm.early_stopping(self.hparams.early_stopping_rounds, verbose=True))
+
+        # Override the eval_metric on the model if configured
+        if getattr(self.hparams, "eval_metric", None):
+            # scikit-learn wrapper will pass this metric to the Booster
+            self.model.set_params(eval_metric=self.hparams.eval_metric)
+
+        self.model = self.model.fit(
+            train_data,
+            train_labels,
+            eval_set=(val_data, val_labels),
+            callbacks=callbacks,
+        )

This change ensures that:

  • model/hyperparameter.early_stopping_rounds drives your early-stopping callback
  • model/hyperparameter.eval_metric is passed to LightGBM and used for evaluation logging and stopping

References:
LightGBM scikit-learn API expects callbacks for early stopping and log evaluation; early_stopping_rounds as a fit arg is deprecated in favor of early_stopping() callback (blog.csdn.net).
The wrapper’s fit(...) signature supports eval_metric and custom callbacks for metric tracking (lightgbm.readthedocs.io).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Early stopping and validation
model/hyperparameter.early_stopping_rounds = 100
model/hyperparameter.eval_metric = ["binary_logloss", "auc", "binary_error"]
# Wire through Gin hyperparameters for early stopping and evaluation metrics
callbacks = []
# Use Gin’s early_stopping_rounds if set
if getattr(self.hparams, "early_stopping_rounds", None):
callbacks.append(lgbm.early_stopping(self.hparams.early_stopping_rounds, verbose=True))
# Override the eval_metric on the model if configured
if getattr(self.hparams, "eval_metric", None):
# scikit-learn wrapper will pass this metric to the Booster
self.model.set_params(eval_metric=self.hparams.eval_metric)
self.model = self.model.fit(
train_data,
train_labels,
eval_set=(val_data, val_labels),
callbacks=callbacks,
)
🤖 Prompt for AI Agents
In icu_benchmarks/models/ml_models/lgbm.py around the fit_model method (where
callbacks and eval metrics are assembled), the code currently hardcodes an
early-stopping callback using self.hparams.patience and always uses a generic
log_evaluation callback, so Gin-configured
model/hyperparameter.early_stopping_rounds and model/hyperparameter.eval_metric
never reach LightGBM; update fit_model to read early_stopping_rounds and
eval_metric from self.hparams, construct callbacks using
lightgbm.callback.early_stopping(self.hparams.early_stopping_rounds) and a
log_evaluation configured appropriately (or omit if not desired), and pass
eval_metric=self.hparams.eval_metric into the LGBMClassifier.fit(...) call along
with the validation set (eval_set/eval_names) and the assembled callbacks so
LightGBM uses the Gin-provided values for evaluation and stopping.

# Class imbalance handling
model/hyperparameter.is_unbalance = [True, False]
model/hyperparameter.scale_pos_weight = (0.1, 10.0, "log")

Comment on lines +56 to +59
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid combining is_unbalance with scale_pos_weight; prefer one strategy.

LightGBM treats these as alternative approaches to class imbalance. Searching both simultaneously can produce ambiguous or counteracting settings. Prefer tuning scale_pos_weight and fixing is_unbalance=False, or vice versa.

Apply this diff to prefer scale_pos_weight only:

-# Class imbalance handling
-model/hyperparameter.is_unbalance = [True, False]
-model/hyperparameter.scale_pos_weight = (0.1, 10.0, "log")
+# Class imbalance handling (prefer explicit weighting)
+model/hyperparameter.is_unbalance = [False]
+model/hyperparameter.scale_pos_weight = (0.1, 10.0, "log")

If you want to compare strategies, introduce a higher-level toggle (e.g., imbalance_strategy in {"none","is_unbalance","scale_pos_weight"}) and conditionally set params accordingly to avoid invalid combos.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Class imbalance handling
model/hyperparameter.is_unbalance = [True, False]
model/hyperparameter.scale_pos_weight = (0.1, 10.0, "log")
# Class imbalance handling (prefer explicit weighting)
model/hyperparameter.is_unbalance = [False]
model/hyperparameter.scale_pos_weight = (0.1, 10.0, "log")
🤖 Prompt for AI Agents
In configs/prediction_models/LGBMClassifier.gin around lines 56 to 59, the
current config exposes both model/hyperparameter.is_unbalance and
model/hyperparameter.scale_pos_weight which are mutually exclusive; update the
file to prefer scale_pos_weight only by fixing model/hyperparameter.is_unbalance
to False and keeping model/hyperparameter.scale_pos_weight as the tunable range
(0.1, 10.0, "log"); alternatively, if you need to compare strategies, introduce
a single high-level toggle (e.g., imbalance_strategy in
{"none","is_unbalance","scale_pos_weight"}) and conditionally set either
is_unbalance or scale_pos_weight based on that toggle to avoid invalid
combinations.

22 changes: 18 additions & 4 deletions configs/prediction_models/RUSBClassifier.gin
Original file line number Diff line number Diff line change
@@ -1,14 +1,28 @@
# Settings for ImbLearn Balanced Random Forest Classifier.
# Settings for ImbLearn RUSBoost Classifier (Random Under-sampling with Boosting)

# Common settings for ML models
include "configs/prediction_models/common/MLCommon.gin"

# Train params
train_common.model = @RUSBClassifier

# Hyperparameter tuning configuration
model/hyperparameter.class_to_tune = @RUSBClassifier
model/hyperparameter.n_estimators = (10, 50, 100, 200, 500)
model/hyperparameter.learning_rate = (0.005, 1, "log")
model/hyperparameter.sampling_strategy = "auto"

Comment on lines 7 to 11
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Align model name with public API choice (RUSB vs XGBEnsemble)

You removed RUSBClassifier from the public API but this config still binds train_common.model and class_to_tune to RUSBClassifier. Choose one:

  • Keep RUSBClassifier supported (re-export in init.py as suggested), or
  • Migrate this config to XGBEnsembleClassifier and rename the file accordingly.

If migrating:

-train_common.model = @RUSBClassifier
-model/hyperparameter.class_to_tune = @RUSBClassifier
+train_common.model = @XGBEnsembleClassifier
+model/hyperparameter.class_to_tune = @XGBEnsembleClassifier

🏁 Script executed:

#!/usr/bin/env bash
set -e

echo "Finding RUSBClassifier class definitions..."
rg -nP "class\s+RUSBClassifier\b" .

echo "Searching for RUSBClassifier exports in __init__.py..."
rg -n "RUSBClassifier" -g "**/__init__.py" -n

echo "Finding XGBEnsembleClassifier class definitions..."
rg -nP "class\s+XGBEnsembleClassifier\b" .

echo "Searching for XGBEnsembleClassifier references..."
rg -n "XGBEnsembleClassifier" -g "**/*.py" -n

Length of output: 143


🏁 Script executed:

#!/usr/bin/env bash
set -e

echo "Searching for any occurrences of RUSBClassifier..."
rg -n "RUSBClassifier" .

echo "Searching for any occurrences of XGBEnsembleClassifier..."
rg -n "XGBEnsembleClassifier" .

echo "Searching for filenames containing RUSBClassifier or XGBEnsembleClassifier..."
fd --hidden --no-ignore RUSBClassifier
fd --hidden --no-ignore XGBEnsembleClassifier

Length of output: 2022


Align config with public API—migrate or re-export RUSBClassifier
configs/prediction_models/RUSBClassifier.gin still references the removed RUSBClassifier class; either re-export RUSBClassifier in icu_benchmarks/models/init.py or update this config (and filename) to use XGBEnsembleClassifier.

If migrating:

-train_common.model = @RUSBClassifier
-model/hyperparameter.class_to_tune = @RUSBClassifier
+train_common.model = @XGBEnsembleClassifier
+model/hyperparameter.class_to_tune = @XGBEnsembleClassifier
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
train_common.model = @RUSBClassifier
# Hyperparameter tuning configuration
model/hyperparameter.class_to_tune = @RUSBClassifier
model/hyperparameter.n_estimators = (10, 50, 100, 200, 500)
model/hyperparameter.learning_rate = (0.005, 1, "log")
model/hyperparameter.sampling_strategy = "auto"
train_common.model = @XGBEnsembleClassifier
# Hyperparameter tuning configuration
model/hyperparameter.class_to_tune = @XGBEnsembleClassifier
🤖 Prompt for AI Agents
In configs/prediction_models/RUSBClassifier.gin around lines 7 to 11, the config
references the removed RUSBClassifier; either re-export RUSBClassifier from
icu_benchmarks/models/__init__.py or update this config (and its filename) to
use the current public API XGBEnsembleClassifier. If you choose re-exporting,
add an import and export of RUSBClassifier in icu_benchmarks/models/__init__.py
so existing gin configs keep working. If you choose migration, rename the file
to match the new model (e.g., XGBEnsembleClassifier.gin), replace both
occurrences of RUSBClassifier with XGBEnsembleClassifier, and verify any
hyperparameter names match the XGBEnsembleClassifier API.

# Number of estimators (boosting rounds)
model/hyperparameter.n_estimators = (50, 100, 200, 300, 500)

# Learning rate for boosting
model/hyperparameter.learning_rate = (0.01, 2.0, "log")

# Sampling strategy for random under-sampling
model/hyperparameter.sampling_strategy = ["auto", "majority", "not minority"]

# Base estimator parameters (typically DecisionTreeClassifier)
model/hyperparameter.base_estimator__max_depth = [1, 2, 3, 4, 5, 6]
model/hyperparameter.base_estimator__min_samples_split = [2, 5, 10, 20]
model/hyperparameter.base_estimator__min_samples_leaf = [1, 2, 5, 10]

# Replacement strategy for under-sampling
model/hyperparameter.replacement = [True, False]

10 changes: 7 additions & 3 deletions configs/prediction_models/XGBClassifier.gin
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,14 @@ train_common.model = @XGBClassifier

model/hyperparameter.class_to_tune = @XGBClassifier
model/hyperparameter.learning_rate = (0.01, 0.1, "log")
model/hyperparameter.n_estimators = [50, 100, 250, 500, 750, 1000,1500,2000]
model/hyperparameter.n_estimators = [50, 100, 250, 500, 750, 1000,1500,2000, 2500, 3000, 3500, 4000, 4500, 5000]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Expanded n_estimators range for potentially deeper ensemble models.

The range of n_estimators has been significantly increased from a previous maximum of 2000 to 5000, allowing for much deeper ensemble models. This can lead to better performance but may also increase training time substantially.

Make sure this expanded range is compatible with your computational resources. Run the following to check if the tuning is actually enabled:


🏁 Script executed:

#!/bin/bash
# Check if hyperparameter tuning is enabled in MLTuning.gin
rg "tune_hyperparameters.n_calls" configs/prediction_models/common/MLTuning.gin

Length of output: 112


Action Required: Verify Hyperparameter Tuning Configuration

  • The expanded n_estimators range in configs/prediction_models/XGBClassifier.gin now goes up to 5000 (previous max was 2000). This change could improve model performance but also significantly increase training time.
  • However, the output from configs/prediction_models/common/MLTuning.gin shows tune_hyperparameters.n_calls = 0, indicating that hyperparameter tuning is currently disabled.
  • Please confirm whether hyperparameter tuning is intended to be enabled. If not, consider whether expanding the estimator range is still necessary. If tuning should be enabled to take advantage of the broader range, update the tuning configuration accordingly and verify that your computational resources can handle the increased training time.

model/hyperparameter.max_depth = [3, 5, 10, 15]
model/hyperparameter.scale_pos_weight = [1, 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 99, 100, 1000]
model/hyperparameter.min_child_weight = [1, 0.5]
model/hyperparameter.min_child_weight = [0.1, 0.5, 1, 2, 5, 10]
model/hyperparameter.max_delta_step = [0, 1, 2, 3, 4, 5, 10]
model/hyperparameter.colsample_bytree = [0.1, 0.25, 0.5, 0.75, 1.0]
model/hyperparameter.eval_metric = "aucpr"
# model/hyperparameter.eval_metric = "aucpr"
model/hyperparameter.gamma = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2.0]
# model/hyperparameter.early_stopping_rounds = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
model/hyperparameter.reg_lambda = [0, 0.01, 0.1, 1, 10, 100]
model/hyperparameter.reg_alpha = [0, 0.01, 0.1, 1, 10, 100]
21 changes: 21 additions & 0 deletions configs/prediction_models/XGBClassifierGPU.gin
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Settings for XGBoost classifier.

# Common settings for ML models
include "configs/prediction_models/common/MLCommon.gin"

# Train params
train_common.model = @XGBClassifierGPU

model/hyperparameter.class_to_tune = @XGBClassifierGPU
model/hyperparameter.learning_rate = (0.01, 0.1, "log")
model/hyperparameter.n_estimators = [50, 100, 250, 500, 750, 1000,1500,2000, 2500, 3000, 3500, 4000, 4500, 5000]
model/hyperparameter.max_depth = [3, 5, 10, 15]
model/hyperparameter.scale_pos_weight = [1, 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 99, 100, 1000]
model/hyperparameter.min_child_weight = [0.1, 0.5, 1, 2, 5, 10]
model/hyperparameter.max_delta_step = [0, 1, 2, 3, 4, 5, 10]
model/hyperparameter.colsample_bytree = [0.1, 0.25, 0.5, 0.75, 1.0]
# model/hyperparameter.eval_metric = "aucpr"
model/hyperparameter.gamma = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2.0]
# model/hyperparameter.early_stopping_rounds = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
model/hyperparameter.reg_lambda = [0, 0.01, 0.1, 1, 10, 100]
model/hyperparameter.reg_alpha = [0, 0.01, 0.1, 1, 10, 100]
5 changes: 3 additions & 2 deletions configs/prediction_models/common/MLTuning.gin
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Hyperparameter tuner settings for classical Machine Learning.
tune_hyperparameters.scopes = ["model"]
tune_hyperparameters.n_initial_points = 5
tune_hyperparameters.n_calls = 30
tune_hyperparameters.folds_to_tune_on = 5
tune_hyperparameters.n_calls = 100
tune_hyperparameters.folds_to_tune_on = 1
tune_hyperparameters.repetitions_to_tune_on = 5
6 changes: 6 additions & 0 deletions configs/tasks/BinaryClassification.gin
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,12 @@ preprocess.preprocessor = @base_classification_preprocessor
preprocess.modality_mapping = %modality_mapping
preprocess.vars = %vars
preprocess.use_static = True
preprocess.required_segments = ["OUTCOME", "STATIC"]
preprocess.file_names = {
"DYNAMIC": "dyn.parquet",
"OUTCOME": "outc.parquet",
"STATIC": "sta.parquet",
}

# SELECTING DATASET
include "configs/tasks/common/Dataloader.gin"
Expand Down
4 changes: 3 additions & 1 deletion configs/tasks/common/Dataloader.gin
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ PredictionPandasDataset.vars = %vars
PredictionPandasDataset.ram_cache = True
PredictionPolarsDataset.vars = %vars
PredictionPolarsDataset.ram_cache = True
PredictionPolarsDataset.mps = True
# Imputation
ImputationPandasDataset.vars = %vars
ImputationPandasDataset.ram_cache = True
ImputationPandasDataset.ram_cache = True
PredictionPolarsDataset.mps = True
3 changes: 3 additions & 0 deletions demo_data/mortality24/mimic_demo_static/attrition.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
incl_n,excl_n_total,excl_n
125,10,7
99,34,26
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)

if "%1" == "" goto help

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
4 changes: 4 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
sphinx
sphinx-rtd-theme
sphinx-autoapi
sphinx-autobuild
Loading