Add Histogram split search supports for Oblique forests#245
Add Histogram split search supports for Oblique forests#245ariellubonja wants to merge 4 commits into
Conversation
|
Thank you for the PR! I think there's still some issues with the code, can you check the failing build at https://github.com/google/yggdrasil-decision-forests/actions/runs/19373708468/job/55591997498?pr=245 I wonder if adding a field of type NumericalSplit to SparseObliqueSplit is a good idea here, e.g. This could be used to trigger the new histogram split |
|
Hi Richard! You are right, I should have mentioned this in the original PR. The change requires adding the |
EvaluateProjection's `random` lived in the middle of the parameter list with no default, forcing every caller to pass it explicitly even on the Cart path that doesn't need it. Move it to a trailing parameter with `= nullptr` default in the header, and propagate the reorder through the impl, the three explicit template instantiations, EvaluateProjectionAndSetCondition, and the call sites in oblique.cc and vector_sequence.cc. No behavior change.
…plit
Per Richard's review: add a dedicated NumericalSplit field to
SparseObliqueSplit so the projection-axis split type is orthogonal to
the top-level DecisionTreeTrainingConfig.numerical_split (which only
controls axis-aligned numerical features). Setting
sparse_oblique_split.projection_split.type to HISTOGRAM_RANDOM or
HISTOGRAM_EQUAL_WIDTH routes the projected-value scan through the
histogram finder; absent or EXACT keeps the existing CART path.
Implementation notes:
- The histogram finder reads dt_config.numerical_split().{type,
num_candidates}, so EvaluateProjection clones dt_config and
overwrites its top-level numerical_split with the projection_split
before calling. Avoids changing the histogram fn's signature, which
would expand the diff into training.cc/h unnecessarily.
- MHLD callers (vector_sequence.cc) don't set sparse_oblique_split, so
projection_split() returns a default NumericalSplit with type=EXACT;
the MHLD path stays on the CART path with no behavior change.
- SetDefaultHyperParameters mirrors the existing num_candidates
auto-default (1 for HISTOGRAM_RANDOM, 255 for HISTOGRAM_EQUAL_WIDTH)
for the new projection_split field.
- Classification only in this commit; regression histogram dispatch is
a mechanical follow-up (FindSplitLabelRegressionFeatureNumerical
Histogram exists upstream).
|
Hi Richard! I would like to incorporate Random histogram SPORF into YDF going forward. I implemented the plumbing of the RNG generator |
Hi Richard, Mathieu!
This is minimal changes to add Histogramming support to Oblique forests
As for user-side toggling, we discussed w/ Richard whether decision_tree.proto:NumericalSplit should have an analogous version for only histograms. This requires further discussion