Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Catboost native multi-output with RMSE #2659

Merged

Conversation

jonasblanc
Copy link
Contributor

@jonasblanc jonasblanc commented Jan 30, 2025

Checklist before merging this PR:

  • Mentioned all issues that this PR fixes or addresses.
  • Summarized the updates of this PR under Summary.
  • Added an entry under Unreleased in the Changelog.

Fixes #1306

Summary

Refactor logic handling native support for multi-output.

  • Move model specific logic into the newly defined RegressionModel._native_support_multioutput() (and sub-classes) to return if the underlying model does support multi-output natively.
  • Refactor the logic leading to the decision to wrap a model into a MultiOutputRegressor.
  • Switch from sklearn _get_tags to __sklearn_tags__ as it will be required by sklearn 1.7. This leads to increasing both sklearn and XGBoost minimum supported version to 1.6 and 2.1.4 respectively. See sklearn release notes for more information about tags and XGBoost release notes for sklearn 1.6 compatibility.
  • Improve CatBoostModel documentation by describing how to use native multioutput regression.

Other Information

Copy link

codecov bot commented Feb 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.10%. Comparing base (b1f7327) to head (76f29d1).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2659      +/-   ##
==========================================
- Coverage   94.16%   94.10%   -0.07%     
==========================================
  Files         141      141              
  Lines       15601    15596       -5     
==========================================
- Hits        14691    14676      -15     
- Misses        910      920      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @jonasblanc, this looks great and is going into the right direction 🚀

I added a couple of suggestions. The major point would be that I think we should leave it to the user to decide whether to use native, or darts' multioutput regression.

@jonasblanc jonasblanc marked this pull request as ready for review February 18, 2025 08:05
Copy link
Collaborator

@madtoinou madtoinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @jonasblanc for the PR, just minor changes but it looks very good to me!

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful 😍 Thanks a lot for this great PR, and it also looks much more clean now with the refactoring to sklearn >= 1.6.0 🚀

Made some last minor updates.

@dennisbader dennisbader merged commit 62dca8f into unit8co:master Mar 7, 2025
9 checks passed
@jonasblanc jonasblanc deleted the feat/native-multioutput-catboost branch March 7, 2025 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make CatboostModel use native Catboost multi-output when possible
3 participants