Skip to content

Add Histogram split search supports for Oblique forests#245

Open
ariellubonja wants to merge 4 commits into
google:mainfrom
ariellubonja:feature-add-oblique-histogramming
Open

Add Histogram split search supports for Oblique forests#245
ariellubonja wants to merge 4 commits into
google:mainfrom
ariellubonja:feature-add-oblique-histogramming

Conversation

@ariellubonja

Copy link
Copy Markdown
Contributor

Hi Richard, Mathieu!

This is minimal changes to add Histogramming support to Oblique forests

As for user-side toggling, we discussed w/ Richard whether decision_tree.proto:NumericalSplit should have an analogous version for only histograms. This requires further discussion

@rstz

rstz commented Nov 17, 2025

Copy link
Copy Markdown
Collaborator

Thank you for the PR! I think there's still some issues with the code, can you check the failing build at https://github.com/google/yggdrasil-decision-forests/actions/runs/19373708468/job/55591997498?pr=245

I wonder if adding a field of type NumericalSplit to SparseObliqueSplit is a good idea here, e.g.

// Controls how the split in the projection space is computed.
optional NumericalSplit projection_split = 12;

https://github.com/google/yggdrasil-decision-forests/blob/main/yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto#L165

This could be used to trigger the new histogram split

@ariellubonja

Copy link
Copy Markdown
Contributor Author

Hi Richard! You are right, I should have mentioned this in the original PR. The change requires adding the random argument to EvaluateProjection, which requires changing files that depend on it, e.g. vector_sequence.cc, so I decided to leave it out (which obviously causes build to fail). I will very shortly push a fix that adds these, and we can discuss best practices

ariellubonja and others added 3 commits November 17, 2025 10:33
EvaluateProjection's `random` lived in the middle of the parameter list
with no default, forcing every caller to pass it explicitly even on the
Cart path that doesn't need it. Move it to a trailing parameter with
`= nullptr` default in the header, and propagate the reorder through
the impl, the three explicit template instantiations,
EvaluateProjectionAndSetCondition, and the call sites in oblique.cc
and vector_sequence.cc. No behavior change.
…plit

Per Richard's review: add a dedicated NumericalSplit field to
SparseObliqueSplit so the projection-axis split type is orthogonal to
the top-level DecisionTreeTrainingConfig.numerical_split (which only
controls axis-aligned numerical features). Setting
sparse_oblique_split.projection_split.type to HISTOGRAM_RANDOM or
HISTOGRAM_EQUAL_WIDTH routes the projected-value scan through the
histogram finder; absent or EXACT keeps the existing CART path.

Implementation notes:
- The histogram finder reads dt_config.numerical_split().{type,
  num_candidates}, so EvaluateProjection clones dt_config and
  overwrites its top-level numerical_split with the projection_split
  before calling. Avoids changing the histogram fn's signature, which
  would expand the diff into training.cc/h unnecessarily.
- MHLD callers (vector_sequence.cc) don't set sparse_oblique_split, so
  projection_split() returns a default NumericalSplit with type=EXACT;
  the MHLD path stays on the CART path with no behavior change.
- SetDefaultHyperParameters mirrors the existing num_candidates
  auto-default (1 for HISTOGRAM_RANDOM, 255 for HISTOGRAM_EQUAL_WIDTH)
  for the new projection_split field.
- Classification only in this commit; regression histogram dispatch is
  a mechanical follow-up (FindSplitLabelRegressionFeatureNumerical
  Histogram exists upstream).
@ariellubonja

Copy link
Copy Markdown
Contributor Author

Hi Richard! I would like to incorporate Random histogram SPORF into YDF going forward. I implemented the plumbing of the RNG generator random. Please take a look. Apologies for forgetting this PR open all this time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants