Skip to content

Conversation

@syzonyuliia
Copy link
Contributor

@syzonyuliia syzonyuliia commented Jun 19, 2024

@wendycwong
Copy link
Contributor

Please add to your CompressLeaf class accommodation for leave nodes that will contain an array of probabilities one for each of the multinomial class. Currently, it only takes one value for binary classification.

@wendycwong
Copy link
Contributor

I run into NPE error with this dataset:
sdt_3EnumCols_10kRows_multinomial.csv

@wendycwong
Copy link
Contributor

With this code:

import sys
sys.path.insert(1,"../../../")
import h2o
from tests import pyunit_utils
from h2o.estimators import H2ODecisionTreeEstimator

def test_dt_multinomial():
data = h2o.import_file(pyunit_utils.locate("smalldata/sdt/sdt_3EnumCols_10kRows_multinomial.csv"))
response_col = "response"
data[response_col] = data[response_col].asfactor()

predictors = ["C1", "C2", "C3"]

# train model
dt = H2ODecisionTreeEstimator(max_depth=3)
dt.train(x=predictors, y=response_col, training_frame=data)

dt.show()

if name == "main":
pyunit_utils.standalone_test(test_dt_multinomial)
else:
test_dt_multinomial()

@syzonyuliia syzonyuliia requested a review from valenad1 July 19, 2024 13:12
@valenad1
Copy link
Collaborator

I would like to have an basic

  • Java test for multinomial
  • Python test for multinomial
  • R test for multinomial
  • Comparsion with DRF one tree multinomial



private static double calculateCriterionOfSplit(SplitStatistics binStatistics) {
// if(binStatistics.() == 2) // todo - fix bin statistics first, they are binomial-only now
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please finish those TODOs

@wendycwong
Copy link
Contributor

@syzonyuliia

Thank you for your work. It looks great.

"categorical_encoding",
"response_column",
"seed",
"distribution",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wendycwong what is the idea behind the distribution here? I see that we optimize entropy in splits.. the attribute is not used in code.

@valenad1 valenad1 marked this pull request as draft January 19, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants