15% Random Forest and 30% Sparse Oblique speedup by restructuring Split Search in Random Histogram#356
Open
ariellubonja wants to merge 1 commit into
Open
Conversation
…e Random Histogram
Collaborator
|
Hi Ariel, thank you very much, this looks very interesting. I quickly ran a very limited benchmark of end-to-end datasets and didn't see an improvement yet I'll re-run on your datasets and share what I see. Maybe I can also finally opensource some of the code we have for end-to-end benchmarks to make comparisons easier |
Contributor
Author
|
Hi Richard! I figured it out! Sparse Oblique with Random Histograms is not in your YDF - I have added that and forgot. It seems your YDF Sparse Oblique has only Exact. Good news: I re-ran with Axis-aligned and the benefit is there! Edited the PR |
Contributor
Author
|
It's also interesting to check whether the Regression branch would benefit by this or a similar approach |
Contributor
Author
|
Hi Richard! You can see the speedup now for this PR? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi again Richard & Mathieu!
More significant and interesting speedup: pulling invariants out of the Random Histogram's Scan candidate thresholds loop gives 15% speedup in Axis-aligned RF (Random Hist) and 30% in Sparse Oblique across many datasets, and accuracy is identical save for floating-point differences (across 35 datasets x 10 different seeds).
Per-function speedup:
In my CHRONO times below, Scanning Thresholds is only 20% of the overall time, yet the end-to-end speedup is 30%+. This suggests my CHRONO has overhead that distorts the underlying relative weights of functions.
Full runtime tests + Accuracy