Skip to content

Commit 7c2fda8

Browse files
author
h2o-ops
committed
Merge remote-tracking branch origin/rel-3.46.0
2 parents eb3f6a4 + 91f4ffa commit 7c2fda8

8 files changed

Lines changed: 36 additions & 45 deletions

File tree

h2o-algos/src/main/java/hex/gam/GAM.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,8 @@ public class GAM extends ModelBuilder<GAMModel, GAMModel.GAMParameters, GAMModel
6767

6868
@Override
6969
public ModelCategory[] can_build() {
70-
return new ModelCategory[]{ModelCategory.Regression};
70+
return new ModelCategory[]{ModelCategory.Regression, ModelCategory.Binomial, ModelCategory.Multinomial,
71+
ModelCategory.Ordinal};
7172
}
7273

7374
@Override

h2o-algos/src/main/java/hex/glm/GLM.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,8 +108,10 @@ public boolean isSupervised() {
108108
@Override
109109
public ModelCategory[] can_build() {
110110
return new ModelCategory[]{
111-
ModelCategory.Regression,
112-
ModelCategory.Binomial,
111+
ModelCategory.Regression,
112+
ModelCategory.Binomial,
113+
ModelCategory.Multinomial,
114+
ModelCategory.Ordinal
113115
};
114116
}
115117

h2o-algos/src/main/java/hex/modelselection/ModelSelectionModel.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ protected double[] score0(double[] data, double[] preds) {
4848

4949
@Override
5050
public Frame score(Frame fr, String destination_key, Job j, boolean computeMetrics, CFuncRef customMetricFunc) {
51-
throw new UnsupportedOperationException("AnovaGLM does not support scoring on data. It only provide " +
51+
throw new UnsupportedOperationException("ModelSelection does not support scoring on data. It only provide " +
5252
"information on predictor relevance");
5353
}
5454

h2o-bindings/bin/custom/python/gen_gam.py

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,6 @@ def update_param(name, param):
1515

1616

1717
def class_extensions():
18-
def _additional_used_columns(self, parms):
19-
"""
20-
:return: Gam columns if specified.
21-
"""
22-
return parms["gam_columns"]
23-
2418
def _summary(self):
2519
"""Return a detailed summary of the model."""
2620
model = self._model_json["output"]

h2o-docs/src/product/automl.rst

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -3,49 +3,49 @@
33
:scale: 50%
44
:align: center
55

6-
H2O AutoML: Automatic Machine Learning
6+
H2O AutoML: Automatic machine learning
77
==================================
88

9-
In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts. The first steps toward simplifying machine learning involved developing simple, unified interfaces to a variety of machine learning algorithms (e.g. H2O).
9+
In recent years, the demand for machine learning experts has outpaced supply, despite a surge of people entering the field. To address this gap, significant progress has been made in developing user-friendly machine learning software that non-experts can use. The initial steps toward simplifying machine learning involved creating simple, unified interfaces for a variety of machine learning algorithms, such as H2O.
1010

11-
Although H2O has made it easy for non-experts to experiment with machine learning, there is still a fair bit of knowledge and background in data science that is required to produce high-performing machine learning models. Deep Neural Networks in particular are notoriously difficult for a non-expert to tune properly. In order for machine learning software to truly be accessible to non-experts, we have designed an easy-to-use interface which automates the process of training a large selection of candidate models. H2O's AutoML can also be a helpful tool for the advanced user, by providing a simple wrapper function that performs a large number of modeling-related tasks that would typically require many lines of code, and by freeing up their time to focus on other aspects of the data science pipeline tasks such as data-preprocessing, feature engineering and model deployment.
11+
Although H2O has made it easier for non-experts to experiment with machine learning, a fair bit of knowledge and background in data science is still required to produce high-performing models. Deep neural networks, in particular, are notoriously difficult for a non-expert to tune properly. To make machine learning software truly accessible to non-experts, we have designed an easy-to-use interface that automates the process of training a large selection of candidate models. H2Os AutoML is also a helpful tool for advanced users. It provides a simple wrapper function that performs many modeling-related tasks, typically requiring extensive code, freeing up time to focus on other data science tasks such as data preprocessing, feature engineering, and model deployment.
1212

1313
H2O's AutoML can be used for automating the machine learning workflow, which includes automatic training and tuning of many models within a user-specified time-limit.
1414

1515
H2O offers a number of `model explainability <http://docs.h2o.ai/h2o/latest-stable/h2o-docs/explain.html>`__ methods that apply to AutoML objects (groups of models), as well as individual models (e.g. leader model). Explanations can be generated automatically with a single function call, providing a simple interface to exploring and explaining the AutoML models.
1616

1717

18-
AutoML Interface
18+
AutoML interface
1919
----------------
2020

21-
The H2O AutoML interface is designed to have as few parameters as possible so that all the user needs to do is point to their dataset, identify the response column and optionally specify a time constraint or limit on the number of total models trained. Below are the parameters that can be set by the user in the R and Python interfaces. See the `Web UI via H2O Wave <#web-ui-via-h2o-wave>`__ section below for information on how to use the H2O Wave web interface for AutoML.
21+
The H2O AutoML interface is designed to have as few parameters as possible so that all the user needs to do is point to their dataset, identify the response column, and optionally specify a time constraint or limit on the number of total models trained. Below are the parameters that can be set by the user in the R and Python interfaces. See the `Web UI via H2O Wave <#web-ui-via-h2o-wave>`__ section below for information on how to use the H2O Wave web interface for AutoML.
2222

2323
In both the R and Python API, AutoML uses the same data-related arguments, ``x``, ``y``, ``training_frame``, ``validation_frame``, as the other H2O algorithms. Most of the time, all you'll need to do is specify the data arguments. You can then configure values for ``max_runtime_secs`` and/or ``max_models`` to set explicit time or number-of-model limits on your run.
2424

25-
Required Parameters
25+
Required parameters
2626
~~~~~~~~~~~~~~~~~~~
2727

28-
Required Data Parameters
28+
Required data parameters
2929
''''''''''''''''''''''''
3030

3131
- `y <data-science/algo-params/y.html>`__: This argument is the name (or index) of the response column.
3232

3333
- `training_frame <data-science/algo-params/training_frame.html>`__: Specifies the training set.
3434

35-
Required Stopping Parameters
35+
Required stopping parameters
3636
''''''''''''''''''''''''''''
3737

38-
One of the following stopping strategies (time or number-of-model based) must be specified. When both options are set, then the AutoML run will stop as soon as it hits one of either When both options are set, then the AutoML run will stop as soon as it hits either of these limits.
38+
One of the following stopping strategies (time or number-of-model based) must be specified. When both options are set, the AutoML run will stop as soon as it reaches either of these limits.
3939

4040
- `max_runtime_secs <data-science/algo-params/max_runtime_secs.html>`__: This argument specifies the maximum time that the AutoML process will run for. The default is 0 (no limit), but dynamically sets to 1 hour if none of ``max_runtime_secs`` and ``max_models`` are specified by the user.
4141

4242
- `max_models <data-science/algo-params/max_models.html>`__: Specify the maximum number of models to build in an AutoML run, excluding the Stacked Ensemble models. Defaults to ``NULL/None``. Always set this parameter to ensure AutoML reproducibility: all models are then trained until convergence and none is constrained by a time budget.
4343

4444

45-
Optional Parameters
45+
Optional parameters
4646
~~~~~~~~~~~~~~~~~~~
4747

48-
Optional Data Parameters
48+
Optional data parameters
4949
''''''''''''''''''''''''
5050

5151
- `x <data-science/algo-params/x.html>`__: A list/vector of predictor column names or indexes. This argument only needs to be specified if the user wants to exclude columns from the set of predictors. If all columns (other than the response) should be used in prediction, then this does not need to be set.
@@ -60,7 +60,7 @@ Optional Data Parameters
6060

6161
- `weights_column <data-science/algo-params/weights_column.html>`__: Specifies a column with observation weights. Giving some observation a weight of zero is equivalent to excluding it from the dataset; giving an observation a relative weight of 2 is equivalent to repeating that row twice. Negative weights are not allowed.
6262

63-
Optional Miscellaneous Parameters
63+
Optional miscellaneous parameters
6464
'''''''''''''''''''''''''''''''''
6565

6666
- `nfolds <data-science/algo-params/nfolds.html>`__: Specify a value >= 2 for the number of folds for k-fold cross-validation of the models in the AutoML run or specify "-1" to let AutoML choose if k-fold cross-validation or blending mode should be used. Blending mode will use part of ``training_frame`` (if no ``blending_frame`` is provided) to train Stacked Ensembles. Use 0 to disable cross-validation; this will also disable Stacked Ensembles (thus decreasing the overall best model performance). This value defaults to "-1".
@@ -142,17 +142,17 @@ Optional Miscellaneous Parameters
142142
Notes
143143
~~~~~
144144

145-
Validation Options
145+
Validation options
146146
''''''''''''''''''
147147

148148
If the user turns off cross-validation by setting ``nfolds == 0``, then cross-validation metrics will not be available to populate the leaderboard. In this case, we need to make sure there is a holdout frame (i.e. the "leaderboard frame") to score the models on so that we can generate model performance metrics for the leaderboard. Without cross-validation, we will also require a validation frame to be used for early stopping on the models. Therefore, if either of these frames are not provided by the user, they will be automatically partitioned from the training data. If either frame is missing, 10% of the training data will be used to create a missing frame (if both are missing then a total of 20% of the training data will be used to create a 10% validation and 10% leaderboard frame).
149149

150-
XGBoost Memory Requirements
150+
XGBoost memory requirements
151151
'''''''''''''''''''''''''''
152152

153153
XGBoost, which is included in H2O as a third party library, requires its own memory outside the H2O (Java) cluster. When running AutoML with XGBoost (it is included by default), be sure you allow H2O no more than 2/3 of the total available RAM. Example: If you have 60G RAM, use ``h2o.init(max_mem_size = "40G")``, leaving 20G for XGBoost.
154154

155-
Scikit-learn Compatibility
155+
Scikit-learn compatibility
156156
''''''''''''''''''''''''''
157157

158158
``H2OAutoML`` can interact with the ``h2o.sklearn`` module. The ``h2o.sklearn`` module exposes 2 wrappers for ``H2OAutoML`` (``H2OAutoMLClassifier`` and ``H2OAutoMLRegressor``), which expose the standard API familiar to ``sklearn`` users: ``fit``, ``predict``, ``fit_predict``, ``score``, ``get_params``, and ``set_params``. It accepts various formats as input data (H2OFrame, ``numpy`` array, ``pandas`` Dataframe) which allows them to be combined with pure ``sklearn`` components in pipelines. For an example using ``H2OAutoML`` with the ``h2o.sklearn`` module, click `here <https://github.com/h2oai/h2o-tutorials/blob/master/tutorials/sklearn-integration/H2OAutoML_as_sklearn_estimator.ipynb>`__.
@@ -164,7 +164,7 @@ Explainability
164164
AutoML objects are fully supported though the `H2O Model Explainability <http://docs.h2o.ai/h2o/latest-stable/h2o-docs/explain.html>`__ interface. A large number of multi-model comparison and single model (AutoML leader) plots can be generated automatically with a single call to ``h2o.explain()``. We invite you to learn more at page linked above.
165165

166166

167-
Code Examples
167+
Code examples
168168
-------------
169169

170170
Training
@@ -323,7 +323,7 @@ Using the previous code example, you can generate test set predictions as follow
323323
preds = aml.leader.predict(test)
324324

325325

326-
AutoML Output
326+
AutoML output
327327
-------------
328328

329329
Leaderboard
@@ -365,7 +365,7 @@ Here is an example of a leaderboard (with all columns) for a binary classificati
365365

366366
To create a leaderboard with metrics from a new ``leaderboard_frame`` `h2o.make_leaderboard <performance-and-prediction.html#leaderboard>`__ can be used.
367367

368-
Examine Models
368+
Examine models
369369
~~~~~~~~~~~~~~
370370

371371
To examine the trained models more closely, you can interact with the models, either by model ID, or a convenience function which can grab the best model of each model type (ranked by the default metric, or a metric of your choosing).
@@ -438,7 +438,7 @@ Once you have retreived the model in R or Python, you can inspect the model para
438438
xgb.params['ntrees']
439439

440440

441-
AutoML Log
441+
AutoML log
442442
~~~~~~~~~~
443443

444444
When using Python or R clients, you can also access meta information with the following AutoML object properties:
@@ -496,7 +496,7 @@ Below are a few screenhots of the app, though more visualizations are available
496496
:align: center
497497

498498

499-
Experimental Features
499+
Experimental features
500500
---------------------
501501

502502
Preprocessing
@@ -614,15 +614,15 @@ Information about how to cite the H2O software in general is covered in the `H2O
614614
We would love to hear how you've used H2O AutoML,
615615
so if you have a paper that references it, please let us know by opening an issue or submitting a PR to the `Awesome H2O repo <https://github.com/h2oai/awesome-h2o#research-papers>`__ on Github. This is the place that we keep track of papers that use H2O AutoML, and H2O generally.
616616

617-
Random Grid Search Parameters
617+
Random grid search parameters
618618
-----------------------------
619619

620620
AutoML performs a hyperparameter search over a variety of H2O algorithms in order to deliver the best model. In the table below, we list the hyperparameters, along with all potential values that can be randomly chosen in the search. If these models also have a non-default value set for a hyperparameter, we identify it in the list as well. Random Forest and Extremely Randomized Trees are not grid searched (in the current version of AutoML), so they are not included in the list below.
621621

622622
**Note**: AutoML does not run a standard grid search for GLM (returning all the possible models). Instead AutoML builds a single model with ``lambda_search`` enabled and passes a list of ``alpha`` values. It returns only the model with the best alpha-lambda combination rather than one model for each alpha-lambda combination.
623623

624624

625-
GLM Hyperparameters
625+
GLM hyperparameters
626626
~~~~~~~~~~~~~~~~~~~
627627

628628
This table shows the GLM values that are searched over when performing AutoML grid search. Additional information is available `here <https://github.com/h2oai/h2o-3/blob/master/h2o-automl/src/main/java/ai/h2o/automl/modeling/GLMStepsProvider.java>`__.
@@ -636,7 +636,7 @@ This table shows the GLM values that are searched over when performing AutoML gr
636636
+-----------------------------+---------------------------------------------------------------------------------------------+
637637

638638

639-
XGBoost Hyperparameters
639+
XGBoost hyperparameters
640640
~~~~~~~~~~~~~~~~~~~~~~~
641641

642642
This table shows the XGBoost values that are searched over when performing AutoML grid search. Additional information is available `here <https://github.com/h2oai/h2o-3/blob/master/h2o-automl/src/main/java/ai/h2o/automl/modeling/XGBoostSteps.java>`__.
@@ -664,7 +664,7 @@ This table shows the XGBoost values that are searched over when performing AutoM
664664
+------------------------------+---------------------------------------------------------------------------------------------+
665665

666666

667-
GBM Hyperparameters
667+
GBM hyperparameters
668668
~~~~~~~~~~~~~~~~~~~
669669

670670
This table shows the GLM values that are searched over when performing AutoML grid search. Additional information is available `here <https://github.com/h2oai/h2o-3/blob/master/h2o-automl/src/main/java/ai/h2o/automl/modeling/GBMStepsProvider.java>`__.
@@ -690,7 +690,7 @@ This table shows the GLM values that are searched over when performing AutoML gr
690690
+------------------------------+---------------------------------------------------------------------------------------------+
691691

692692

693-
Deep Learning Hyperparameters
693+
Deep learning hyperparameters
694694
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
695695

696696
This table shows the Deep Learning values that are searched over when performing AutoML grid search. Additional information is available `here <https://github.com/h2oai/h2o-3/blob/master/h2o-automl/src/main/java/ai/h2o/automl/modeling/DeepLearningStepsProvider.java>`__.
@@ -718,7 +718,7 @@ This table shows the Deep Learning values that are searched over when performing
718718
+------------------------------+----------------------------------------------------------------------------------------------------------+
719719

720720

721-
Additional Information
721+
Additional information
722722
----------------------
723723

724724
H2O AutoML development is tracked in the `h2o-3 Github repo <https://github.com/h2oai/h2o-3/issues>`__.

h2o-logging/impl-log4j2/build.gradle

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ compileJava {
55
}
66

77
dependencies {
8-
api("org.apache.logging.log4j:log4j-1.2-api:2.17.1")
9-
api("org.apache.logging.log4j:log4j-core:2.17.1")
8+
api("org.apache.logging.log4j:log4j-1.2-api:2.25.3")
9+
api("org.apache.logging.log4j:log4j-core:2.25.3")
1010

1111
testImplementation group: 'junit', name: 'junit', version: '4.12'
1212
}

h2o-py/h2o/estimators/gam.py

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1595,12 +1595,6 @@ def gainslift_bins(self, gainslift_bins):
15951595

15961596
Lambda = deprecated_property('Lambda', lambda_)
15971597

1598-
def _additional_used_columns(self, parms):
1599-
"""
1600-
:return: Gam columns if specified.
1601-
"""
1602-
return parms["gam_columns"]
1603-
16041598
def _summary(self):
16051599
"""Return a detailed summary of the model."""
16061600
model = self._model_json["output"]

h2o-py/tests/testdir_algos/glm/pyunit_benign_glm.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
def test_benign():
99
training_data = h2o.import_file(pyunit_utils.locate("smalldata/logreg/benign.csv"))
10-
10+
training_data[3] = training_data[3].asfactor()
1111
Y = 3
1212
X = [0, 1, 2, 4, 5, 6, 7, 8, 9, 10]
1313

0 commit comments

Comments
 (0)