Skip to content

Add refactored LGBM model to experimental emulators #399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 68 commits into from
Apr 24, 2025

Conversation

edwardchalstrey1
Copy link
Collaborator

@edwardchalstrey1 edwardchalstrey1 commented Apr 14, 2025

  • Closes Refactor: initial impl Light GBM #351
  • Adds LightGBM to experimental, set up to have a get_tune_config method
  • Refactors the original emulator to not use scipy.stats and use np instead
  • Adds tests for simply running the model predict and running the tuner
  • Add a _convert_to_numpy function to utils, used by the fit method

Copy link
Contributor

github-actions bot commented Apr 14, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  autoemulate/emulators
  gaussian_process.py
  autoemulate/experimental/data
  utils.py 105, 111-112
  autoemulate/experimental/emulators
  base.py
  lightgbm.py 78-79, 81-82
  tests/experimental
  test_experimental_lightgbm.py
Project Total  

This report was generated by python-coverage-comment-action

@codecov-commenter
Copy link

codecov-commenter commented Apr 14, 2025

Codecov Report

Attention: Patch coverage is 91.46341% with 7 lines in your changes missing coverage. Please review.

Project coverage is 90.13%. Comparing base (fa9ae91) to head (cf2042b).
Report is 88 commits behind head on main.

Files with missing lines Patch % Lines
autoemulate/experimental/emulators/lightgbm.py 92.15% 4 Missing ⚠️
autoemulate/experimental/data/utils.py 66.66% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #399      +/-   ##
==========================================
- Coverage   91.95%   90.13%   -1.82%     
==========================================
  Files          84       91       +7     
  Lines        4721     5678     +957     
==========================================
+ Hits         4341     5118     +777     
- Misses        380      560     +180     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


self.n_features_in_ = x.shape[1]

x, y = check_X_y(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does chech_X_y do? I feel like any data checks and conversions we want to do should be in the InputTypeMixin (or something similar)?

Copy link
Collaborator Author

@edwardchalstrey1 edwardchalstrey1 Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This checks X and y have the same length and also that X is 2D and y 1D, however there are also other checks you can include that in the old Autoemulate have been set up differently depending on the emulator class e.g. for SVM:

X, y = check_X_y(
            X,
            y,
            multi_output=self._more_tags()["multioutput"],
            y_numeric=True,
            ensure_min_samples=2,
        )

and for polynomials:

X, y = check_X_y(X, y, multi_output=True, y_numeric=True, dtype=np.float64)

Perhaps for now it's best to leave these checks in the fit methods in, then to refactor into the InputTypeMixin later if several of the models we include in experimental are all doing basically the same check or we disagree with some of the differences and think that a simple dimensionality check is sufficient?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with both the above (#399 (comment)) and keeping these checks to be refactored in a new issue for adding a class for validation/checks seems like a good option.

def model_name(self):
return self.__class__.__name__

def _more_tags(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this method and just not call it above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this seems like a weird way to do it - multi_output defaults to False so in this case can remove completely

def predict(self, x: InputLike) -> OutputLike:
"""Predicts the output of the emulator for a given input."""
x = check_array(x)
check_is_fitted(self, "is_fitted_")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using this check function is dependent on this object inheriting the the sklearn base objects I'd be in favour of not doing the inheritance and just getting rid of this (and replacing it with our own check if we think that's necessary)

@edwardchalstrey1 edwardchalstrey1 mentioned this pull request Apr 22, 2025
5 tasks
@edwardchalstrey1
Copy link
Collaborator Author

edwardchalstrey1 commented Apr 23, 2025

Final todos:

  • Finish adding type hints for __init__ in LightGBM and update docstring (see LightGBM docs)
  • See if can move y = y.ravel() # Ensure y is 1-dimensional into LightGBM fit method from the utils. Double check whether this makes sense, look at the docs. This dimension change shouldn't be part of the numpy conversion function.

Copy link
Member

@radka-j radka-j left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@edwardchalstrey1 edwardchalstrey1 merged commit 35cccf8 into main Apr 24, 2025
4 checks passed
@edwardchalstrey1 edwardchalstrey1 deleted the 351-refactor-lgbm branch April 24, 2025 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor: initial impl Light GBM
4 participants