-
Notifications
You must be signed in to change notification settings - Fork 25
Improvements from Cass project #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Changes from all commits
b19fd9e
943d32b
1aea731
57815a5
92ed52d
e16cdb7
60fb351
64b38ed
d943d03
ff0358b
d894332
ab313fb
0d5fa05
baaba5b
0d25e43
7db89ef
af984aa
179c098
0afa916
ee186e8
63ea7be
0ecdb59
320007a
22c8741
de09ec8
31ff2e6
cf29a5a
4671626
2bb47ae
d585996
623ea09
b01cf29
381a968
118e1cf
042ceb7
2e419aa
1270d5e
4194538
f1e7973
c1385f9
02fe208
9d6ed4c
db347fe
5efa7d7
ff4b47d
9a780af
2126046
cb25cfb
ec6bc0d
cbe9a64
ff7dcec
fba0bce
c58574f
22587e7
347b453
18165c2
3ebdaed
80f33cb
15e37ea
08ff68f
754c441
9ce70b0
05eb6dc
ae77b6d
60081ed
0aa66c7
4a0f725
2546b27
95ff49d
de49bc6
5b71c64
187cb2c
aac4d18
5e93636
7559c5f
02889e2
d01674e
1a564e4
8af76b1
9a2aa2a
2632228
b9443de
52895ac
ead4582
ba742cd
cd25d4e
235c02f
7701b00
88b63e7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -6,11 +6,54 @@ include "configs/prediction_models/common/MLCommon.gin" | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Train params | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| train_common.model = @LGBMClassifier | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Hyperparameter tuning configuration | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.class_to_tune = @LGBMClassifier | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.colsample_bytree = (0.33, 1.0) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.max_depth = (3, 7) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.min_child_samples = 1000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.n_estimators = 100000 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.num_leaves = (8, 128, "log", 2) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.subsample = (0.33, 1.0) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.subsample_freq = 1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Core tree parameters | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.n_estimators = (500, 2000, 5000, 10000, 100000) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.max_depth = (3, 5, 7, 10, 15) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.num_leaves = (8, 16, 31, 64, 128, 256, "log", 2) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.min_child_samples = (10, 20, 50, 100, 500, 1000) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.min_child_weight = (1e-3, 10.0, "log") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Learning rate and regularization | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.learning_rate = (0.01, 0.3, "log") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.reg_alpha = (1e-6, 1.0, "log") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.reg_lambda = (1e-6, 1.0, "log") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Sampling parameters | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.subsample = (0.4, 1.0) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.subsample_freq = (0, 1, 5, 10) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.colsample_bytree = (0.4, 1.0) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.colsample_bynode = (0.4, 1.0) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+24
to
+29
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Remove LightGBM/XGBoost alias duplication and an invalid parameter name.
Apply this diff to remove the aliases here and manage sampling via the canonical keys in the Performance block (Lines 42-47): -# Sampling parameters
-model/hyperparameter.subsample = (0.4, 1.0)
-model/hyperparameter.subsample_freq = (0, 1, 5, 10)
-model/hyperparameter.colsample_bytree = (0.4, 1.0)
-model/hyperparameter.colsample_bynode = (0.4, 1.0)Then, add the per-node variant where the canonical feature/bagging params are declared (see the suggested change on Lines 42-47). 📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Boosting parameters | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.boosting_type = ["gbdt", "dart"] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Advanced DART parameters (active when boosting_type="dart") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.drop_rate = (0.1, 0.5) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.max_drop = (10, 50) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.skip_drop = (0.1, 0.9) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # GOSS parameters (active when boosting_type="goss") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.top_rate = (0.1, 0.5) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.other_rate = (0.05, 0.2) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+30
to
+41
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion GOSS parameters present without enabling GOSS; make the search space consistent. You’re tuning Apply one of the following: Option A — enable GOSS: -model/hyperparameter.boosting_type = ["gbdt", "dart"]
+model/hyperparameter.boosting_type = ["gbdt", "dart", "goss"]Option B — keep only GBDT/DART (remove GOSS-only params): -# GOSS parameters (active when boosting_type="goss")
-model/hyperparameter.top_rate = (0.1, 0.5)
-model/hyperparameter.other_rate = (0.05, 0.2)Note: When using GOSS, ensure 📝 Committable suggestion
Suggested change
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Performance and stability | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.feature_fraction = (0.4, 1.0) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.bagging_fraction = (0.4, 1.0) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.bagging_freq = (0, 1, 5, 10) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.min_split_gain = (1e-6, 1.0, "log") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+42
to
+47
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Consolidate sampling on canonical LightGBM params and add by-node feature sampling. Given aliases were removed above, keep tuning here via canonical names. If you want per-node feature sampling, add Apply this diff to extend the canonical block: model/hyperparameter.feature_fraction = (0.4, 1.0)
model/hyperparameter.bagging_fraction = (0.4, 1.0)
model/hyperparameter.bagging_freq = (0, 1, 5, 10)
model/hyperparameter.min_split_gain = (1e-6, 1.0, "log")
+model/hyperparameter.feature_fraction_bynode = (0.4, 1.0)📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Categorical handling | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.cat_smooth = (1.0, 100.0, "log") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.cat_l2 = (1.0, 100.0, "log") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Early stopping and validation | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.early_stopping_rounds = 100 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.eval_metric = ["binary_logloss", "auc", "binary_error"] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+52
to
+55
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainConfirm fit-time plumbing for early stopping and evaluation metric.
Run the following script to verify that your training pipeline passes these through to LightGBM’s Optional: For stability, consider dropping Apply if desired: -model/hyperparameter.eval_metric = ["binary_logloss", "auc", "binary_error"]
+model/hyperparameter.eval_metric = ["binary_logloss", "auc"]🏁 Script executed: #!/bin/bash
set -euo pipefail
# Inspect how tuning params are materialized and passed to LGBM
rg -n -C3 -e 'early_stopping_rounds|eval_metric|class_to_tune|LGBMClassifier|fit\('
# Common places:
# - Tuning utilities that map gin "model/hyperparameter.*" into estimator kwargs
# - Trainer code that calls estimator.fit(..., eval_set=..., eval_metric=..., early_stopping_rounds=...)Length of output: 45667 🏁 Script executed: #!/bin/bash
# Display the beginning of the LGBM wrapper to inspect how fit() is implemented
sed -n '1,200p' icu_benchmarks/models/ml_models/lgbm.pyLength of output: 1753 🏁 Script executed: #!/bin/bash
# Locate the MLWrapper class to inspect how hyperparameters are mapped
rg -C5 -n 'class MLWrapper' icu_benchmarks/models/wrappers.pyLength of output: 485 🏁 Script executed: #!/bin/bash
# Inspect how model hyperparameters (including eval_metric, early_stopping_rounds) are applied during model instantiation
rg -C5 -n 'def set_model_args' icu_benchmarks/models/wrappers.pyLength of output: 654 🏁 Script executed: #!/bin/bash
python3 - << 'EOF'
import lightgbm as lgb
import inspect
print(inspect.signature(lgb.LGBMClassifier.__init__))
EOFLength of output: 132 🏁 Script executed: #!/bin/bash
sed -n '630,730p' icu_benchmarks/models/wrappers.pyLength of output: 4091 Update LGBMWrapper.fit_model to wire through Gin-configured early stopping and eval metrics Your current implementation in
As a result, neither --- a/icu_benchmarks/models/ml_models/lgbm.py
+++ b/icu_benchmarks/models/ml_models/lgbm.py
@@ class LGBMWrapper(MLWrapper):
- # Early stopping and logging (hard-coded)
- callbacks = [
- lgbm.early_stopping(self.hparams.patience, verbose=True),
- lgbm.log_evaluation(period=-1)
- ]
-
- self.model = self.model.fit(
- train_data,
- train_labels,
- eval_set=(val_data, val_labels),
- callbacks=callbacks,
- )
+ # Wire through Gin hyperparameters for early stopping and evaluation metrics
+ callbacks = []
+ # Use Gin’s early_stopping_rounds if set
+ if getattr(self.hparams, "early_stopping_rounds", None):
+ callbacks.append(lgbm.early_stopping(self.hparams.early_stopping_rounds, verbose=True))
+
+ # Override the eval_metric on the model if configured
+ if getattr(self.hparams, "eval_metric", None):
+ # scikit-learn wrapper will pass this metric to the Booster
+ self.model.set_params(eval_metric=self.hparams.eval_metric)
+
+ self.model = self.model.fit(
+ train_data,
+ train_labels,
+ eval_set=(val_data, val_labels),
+ callbacks=callbacks,
+ )This change ensures that:
References: 📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Class imbalance handling | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.is_unbalance = [True, False] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| model/hyperparameter.scale_pos_weight = (0.1, 10.0, "log") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+56
to
+59
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Avoid combining is_unbalance with scale_pos_weight; prefer one strategy. LightGBM treats these as alternative approaches to class imbalance. Searching both simultaneously can produce ambiguous or counteracting settings. Prefer tuning Apply this diff to prefer -# Class imbalance handling
-model/hyperparameter.is_unbalance = [True, False]
-model/hyperparameter.scale_pos_weight = (0.1, 10.0, "log")
+# Class imbalance handling (prefer explicit weighting)
+model/hyperparameter.is_unbalance = [False]
+model/hyperparameter.scale_pos_weight = (0.1, 10.0, "log")If you want to compare strategies, introduce a higher-level toggle (e.g., 📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -1,14 +1,28 @@ | ||||||||||||||||||||||||
| # Settings for ImbLearn Balanced Random Forest Classifier. | ||||||||||||||||||||||||
| # Settings for ImbLearn RUSBoost Classifier (Random Under-sampling with Boosting) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # Common settings for ML models | ||||||||||||||||||||||||
| include "configs/prediction_models/common/MLCommon.gin" | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # Train params | ||||||||||||||||||||||||
| train_common.model = @RUSBClassifier | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # Hyperparameter tuning configuration | ||||||||||||||||||||||||
| model/hyperparameter.class_to_tune = @RUSBClassifier | ||||||||||||||||||||||||
| model/hyperparameter.n_estimators = (10, 50, 100, 200, 500) | ||||||||||||||||||||||||
| model/hyperparameter.learning_rate = (0.005, 1, "log") | ||||||||||||||||||||||||
| model/hyperparameter.sampling_strategy = "auto" | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
|
Comment on lines
7
to
11
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainAlign model name with public API choice (RUSB vs XGBEnsemble) You removed RUSBClassifier from the public API but this config still binds
If migrating: -train_common.model = @RUSBClassifier
-model/hyperparameter.class_to_tune = @RUSBClassifier
+train_common.model = @XGBEnsembleClassifier
+model/hyperparameter.class_to_tune = @XGBEnsembleClassifier🏁 Script executed: #!/usr/bin/env bash
set -e
echo "Finding RUSBClassifier class definitions..."
rg -nP "class\s+RUSBClassifier\b" .
echo "Searching for RUSBClassifier exports in __init__.py..."
rg -n "RUSBClassifier" -g "**/__init__.py" -n
echo "Finding XGBEnsembleClassifier class definitions..."
rg -nP "class\s+XGBEnsembleClassifier\b" .
echo "Searching for XGBEnsembleClassifier references..."
rg -n "XGBEnsembleClassifier" -g "**/*.py" -nLength of output: 143 🏁 Script executed: #!/usr/bin/env bash
set -e
echo "Searching for any occurrences of RUSBClassifier..."
rg -n "RUSBClassifier" .
echo "Searching for any occurrences of XGBEnsembleClassifier..."
rg -n "XGBEnsembleClassifier" .
echo "Searching for filenames containing RUSBClassifier or XGBEnsembleClassifier..."
fd --hidden --no-ignore RUSBClassifier
fd --hidden --no-ignore XGBEnsembleClassifierLength of output: 2022 Align config with public API—migrate or re-export RUSBClassifier If migrating: -train_common.model = @RUSBClassifier
-model/hyperparameter.class_to_tune = @RUSBClassifier
+train_common.model = @XGBEnsembleClassifier
+model/hyperparameter.class_to_tune = @XGBEnsembleClassifier📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||
| # Number of estimators (boosting rounds) | ||||||||||||||||||||||||
| model/hyperparameter.n_estimators = (50, 100, 200, 300, 500) | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # Learning rate for boosting | ||||||||||||||||||||||||
| model/hyperparameter.learning_rate = (0.01, 2.0, "log") | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # Sampling strategy for random under-sampling | ||||||||||||||||||||||||
| model/hyperparameter.sampling_strategy = ["auto", "majority", "not minority"] | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # Base estimator parameters (typically DecisionTreeClassifier) | ||||||||||||||||||||||||
| model/hyperparameter.base_estimator__max_depth = [1, 2, 3, 4, 5, 6] | ||||||||||||||||||||||||
| model/hyperparameter.base_estimator__min_samples_split = [2, 5, 10, 20] | ||||||||||||||||||||||||
| model/hyperparameter.base_estimator__min_samples_leaf = [1, 2, 5, 10] | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| # Replacement strategy for under-sampling | ||||||||||||||||||||||||
| model/hyperparameter.replacement = [True, False] | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -8,10 +8,14 @@ train_common.model = @XGBClassifier | |
|
|
||
| model/hyperparameter.class_to_tune = @XGBClassifier | ||
| model/hyperparameter.learning_rate = (0.01, 0.1, "log") | ||
| model/hyperparameter.n_estimators = [50, 100, 250, 500, 750, 1000,1500,2000] | ||
| model/hyperparameter.n_estimators = [50, 100, 250, 500, 750, 1000,1500,2000, 2500, 3000, 3500, 4000, 4500, 5000] | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainExpanded n_estimators range for potentially deeper ensemble models. The range of n_estimators has been significantly increased from a previous maximum of 2000 to 5000, allowing for much deeper ensemble models. This can lead to better performance but may also increase training time substantially. Make sure this expanded range is compatible with your computational resources. Run the following to check if the tuning is actually enabled: 🏁 Script executed: #!/bin/bash
# Check if hyperparameter tuning is enabled in MLTuning.gin
rg "tune_hyperparameters.n_calls" configs/prediction_models/common/MLTuning.ginLength of output: 112 Action Required: Verify Hyperparameter Tuning Configuration
|
||
| model/hyperparameter.max_depth = [3, 5, 10, 15] | ||
| model/hyperparameter.scale_pos_weight = [1, 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 99, 100, 1000] | ||
| model/hyperparameter.min_child_weight = [1, 0.5] | ||
| model/hyperparameter.min_child_weight = [0.1, 0.5, 1, 2, 5, 10] | ||
| model/hyperparameter.max_delta_step = [0, 1, 2, 3, 4, 5, 10] | ||
| model/hyperparameter.colsample_bytree = [0.1, 0.25, 0.5, 0.75, 1.0] | ||
| model/hyperparameter.eval_metric = "aucpr" | ||
| # model/hyperparameter.eval_metric = "aucpr" | ||
| model/hyperparameter.gamma = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2.0] | ||
| # model/hyperparameter.early_stopping_rounds = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100] | ||
| model/hyperparameter.reg_lambda = [0, 0.01, 0.1, 1, 10, 100] | ||
| model/hyperparameter.reg_alpha = [0, 0.01, 0.1, 1, 10, 100] | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| # Settings for XGBoost classifier. | ||
|
|
||
| # Common settings for ML models | ||
| include "configs/prediction_models/common/MLCommon.gin" | ||
|
|
||
| # Train params | ||
| train_common.model = @XGBClassifierGPU | ||
|
|
||
| model/hyperparameter.class_to_tune = @XGBClassifierGPU | ||
| model/hyperparameter.learning_rate = (0.01, 0.1, "log") | ||
| model/hyperparameter.n_estimators = [50, 100, 250, 500, 750, 1000,1500,2000, 2500, 3000, 3500, 4000, 4500, 5000] | ||
| model/hyperparameter.max_depth = [3, 5, 10, 15] | ||
| model/hyperparameter.scale_pos_weight = [1, 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 99, 100, 1000] | ||
| model/hyperparameter.min_child_weight = [0.1, 0.5, 1, 2, 5, 10] | ||
| model/hyperparameter.max_delta_step = [0, 1, 2, 3, 4, 5, 10] | ||
| model/hyperparameter.colsample_bytree = [0.1, 0.25, 0.5, 0.75, 1.0] | ||
| # model/hyperparameter.eval_metric = "aucpr" | ||
| model/hyperparameter.gamma = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2.0] | ||
| # model/hyperparameter.early_stopping_rounds = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100] | ||
| model/hyperparameter.reg_lambda = [0, 0.01, 0.1, 1, 10, 100] | ||
| model/hyperparameter.reg_alpha = [0, 0.01, 0.1, 1, 10, 100] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,6 @@ | ||
| # Hyperparameter tuner settings for classical Machine Learning. | ||
| tune_hyperparameters.scopes = ["model"] | ||
| tune_hyperparameters.n_initial_points = 5 | ||
| tune_hyperparameters.n_calls = 30 | ||
| tune_hyperparameters.folds_to_tune_on = 5 | ||
| tune_hyperparameters.n_calls = 100 | ||
| tune_hyperparameters.folds_to_tune_on = 1 | ||
| tune_hyperparameters.repetitions_to_tune_on = 5 | ||
rvandewater marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| incl_n,excl_n_total,excl_n | ||
| 125,10,7 | ||
| 99,34,26 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| # Minimal makefile for Sphinx documentation | ||
| # | ||
|
|
||
| # You can set these variables from the command line, and also | ||
| # from the environment for the first two. | ||
| SPHINXOPTS ?= | ||
| SPHINXBUILD ?= sphinx-build | ||
| SOURCEDIR = source | ||
| BUILDDIR = build | ||
|
|
||
| # Put it first so that "make" without argument is like "make help". | ||
| help: | ||
| @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
|
||
| .PHONY: help Makefile | ||
|
|
||
| # Catch-all target: route all unknown targets to Sphinx using the new | ||
| # "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
| %: Makefile | ||
| @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,35 @@ | ||
| @ECHO OFF | ||
|
|
||
| pushd %~dp0 | ||
|
|
||
| REM Command file for Sphinx documentation | ||
|
|
||
| if "%SPHINXBUILD%" == "" ( | ||
| set SPHINXBUILD=sphinx-build | ||
| ) | ||
| set SOURCEDIR=source | ||
| set BUILDDIR=build | ||
|
|
||
| %SPHINXBUILD% >NUL 2>NUL | ||
| if errorlevel 9009 ( | ||
| echo. | ||
| echo.The 'sphinx-build' command was not found. Make sure you have Sphinx | ||
| echo.installed, then set the SPHINXBUILD environment variable to point | ||
| echo.to the full path of the 'sphinx-build' executable. Alternatively you | ||
| echo.may add the Sphinx directory to PATH. | ||
| echo. | ||
| echo.If you don't have Sphinx installed, grab it from | ||
| echo.https://www.sphinx-doc.org/ | ||
| exit /b 1 | ||
| ) | ||
|
|
||
| if "%1" == "" goto help | ||
|
|
||
| %SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% | ||
| goto end | ||
|
|
||
| :help | ||
| %SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% | ||
|
|
||
| :end | ||
| popd |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| sphinx | ||
| sphinx-rtd-theme | ||
| sphinx-autoapi | ||
| sphinx-autobuild |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fix malformed num_leaves spec and consider simplifying n_estimators.
num_leavestuple mixes discrete values with"log", 2. This is likely to break your parser (or yield invalid values). Prefer either a discrete list or a proper range tuple.n_estimatorsadds search noise/cost. A log-range or a single high ceiling with early stopping is more efficient.Apply this diff to make both entries unambiguous:
Notes:
[15, 31, 63, 127, 255]instead.max_depth = [-1, 3, 5, 7, 10, 15]so the unconstrained depth case is explored.📝 Committable suggestion
🤖 Prompt for AI Agents