Add refactored LGBM model to experimental emulators #399

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

edwardchalstrey1 merged 68 commits into main from 351-refactor-lgbm

Apr 24, 2025

+184 −0

Collaborator

edwardchalstrey1 commented Apr 14, 2025 •

edited

Loading

Closes Refactor: initial impl Light GBM #351
Adds LightGBM to experimental, set up to have a get_tune_config method
Refactors the original emulator to not use scipy.stats and use np instead
Adds tests for simply running the model predict and running the tuner
Add a _convert_to_numpy function to utils, used by the fit method

edwardchalstrey1 added 19 commits

April 10, 2025 16:19


          copy existing lightgm class

2379ee2


          add base classes and type hints

1ace801


          convert to tensors in Tuner

ed9a3eb


          add lightgbm get_tune_config staticmethod

211f836


          add test

619b554


          import InputLike and OutputLike types in LightGBM emulator

6e63257


          Merge branch '349-refactor-gp' into 351-refactor-lgbm

20549f9


          fix func name

81ccb76


          use numpy not tensor

40eff87


          separate tunder logic for lightgbm

5e5da2e


          Merge branch 'main' into 351-refactor-lgbm

68053ed


          add _convert_to_numpy function

77feda8


          spelling mistake

97740f2


          change input to numpy array and output to tensor

d77e5a0


          remove lightgbm specific code from tuner

43f4871


          use np instead of scipy for get_tune_config

ae75b39


          Merge branch '349-refactor-gp' into 351-refactor-lgbm

3cd7da6


          remove torch import

ce5a0e2


          Ensure the output of predict is a 2D tensor array with shape (n_sampl…

45c7080

…es, 1)

Contributor

github-actions bot commented Apr 14, 2025 •

edited

Loading

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
autoemulate/emulators
gaussian_process.py
autoemulate/experimental/data
utils.py					105, 111-112
autoemulate/experimental/emulators
base.py
lightgbm.py					78-79, 81-82
tests/experimental
test_experimental_lightgbm.py
Project Total

_{This report was generated by python-coverage-comment-action}

codecov-commenter commented Apr 14, 2025 •

edited

Loading

Codecov Report

Attention: Patch coverage is 91.46341% with 7 lines in your changes missing coverage. Please review.

Project coverage is 90.13%. Comparing base (fa9ae91) to head (cf2042b).
Report is 88 commits behind head on main.

Files with missing lines	Patch %	Lines
autoemulate/experimental/emulators/lightgbm.py	92.15%	4 Missing ⚠️
autoemulate/experimental/data/utils.py	66.66%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #399      +/-   ##
==========================================
- Coverage   91.95%   90.13%   -1.82%     
==========================================
  Files          84       91       +7     
  Lines        4721     5678     +957     
==========================================
+ Hits         4341     5118     +777     
- Misses        380      560     +180

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

edwardchalstrey1 added 9 commits

April 15, 2025 09:31


          remove commented out param

be4f285


          remove unused LightGBM import from tuner.py

4de6c9e


          change example data to be tensor to show numpy conversion worked

3dfe87a


          add test_predict_lightgbm

d7ac488


          Ensure y is 1-dimensional in _convert_to_numpy method

b47f62c


          Merge branch 'main' into 351-refactor-lgbm

5f58648


          remove sample data now in conftest.py

0f8fb51


          remove whitespace

844ca9b


          update docstring and type hints _convert_to_numpy

6d771e8

radka-j reviewed

View reviewed changes

autoemulate/experimental/data/utils.py Show resolved Hide resolved

autoemulate/experimental/data/utils.py Outdated Show resolved Hide resolved

autoemulate/experimental/emulators/lightgbm.py Outdated Show resolved Hide resolved

autoemulate/experimental/emulators/lightgbm.py Outdated Show resolved Hide resolved

autoemulate/experimental/emulators/lightgbm.py Outdated


		self.n_features_in_ = x.shape[1]

		x, y = check_X_y(

Member

radka-j Apr 17, 2025

What does chech_X_y do? I feel like any data checks and conversions we want to do should be in the InputTypeMixin (or something similar)?

Collaborator Author

edwardchalstrey1 Apr 17, 2025 •

edited

Loading

This checks X and y have the same length and also that X is 2D and y 1D, however there are also other checks you can include that in the old Autoemulate have been set up differently depending on the emulator class e.g. for SVM:

X, y = check_X_y(
            X,
            y,
            multi_output=self._more_tags()["multioutput"],
            y_numeric=True,
            ensure_min_samples=2,
        )

and for polynomials:

X, y = check_X_y(X, y, multi_output=True, y_numeric=True, dtype=np.float64)

Perhaps for now it's best to leave these checks in the fit methods in, then to refactor into the InputTypeMixin later if several of the models we include in experimental are all doing basically the same check or we disagree with some of the differences and think that a simple dimensionality check is sufficient?

Collaborator

sgreenbury Apr 23, 2025

Agree with both the above (#399 (comment)) and keeping these checks to be refactored in a new issue for adding a class for validation/checks seems like a good option.

autoemulate/experimental/emulators/lightgbm.py Outdated

+                  def model_name(self):
+                      return self.__class__.__name__
+                  def _more_tags(self):

Member

radka-j Apr 17, 2025

Can we remove this method and just not call it above?

Collaborator Author

edwardchalstrey1 Apr 17, 2025

Yes, this seems like a weird way to do it - multi_output defaults to False so in this case can remove completely

autoemulate/experimental/emulators/lightgbm.py

+                  def predict(self, x: InputLike) -> OutputLike:
+                      """Predicts the output of the emulator for a given input."""
+                      x = check_array(x)
+                      check_is_fitted(self, "is_fitted_")

Member

radka-j Apr 17, 2025

If using this check function is dependent on this object inheriting the the sklearn base objects I'd be in favour of not doing the inheritance and just getting rid of this (and replacing it with our own check if we think that's necessary)

autoemulate/experimental/emulators/lightgbm.py Show resolved Hide resolved

autoemulate/experimental/tuner.py Outdated Show resolved Hide resolved

edwardchalstrey1 added 6 commits

April 17, 2025 13:56


          Add handling for tuple conversion of numpy arrays in _convert_to_nump…

77d2703

…y method


          Fix _convert_to_numpy method to handle optional second input and ensu…

121ccc9

…re proper output format


          again remove unused imports from LightGBM class definition

30157e3


          remove kwargs and sample_weight

b257955


          Remove multi_output parameter from check_X_y and delete unused _more_…

4c9f2a7

…tags method


          Refactor check_X_y call for improved readability (ruff-format)

caa2fef

edwardchalstrey1 requested a review from radka-j

April 17, 2025 13:45

edwardchalstrey1 mentioned this pull request

Refactor sklearn models for experimental #415

Merged

5 tasks

sgreenbury reviewed

View reviewed changes

autoemulate/experimental/emulators/lightgbm.py Outdated Show resolved Hide resolved

edwardchalstrey1 and others added 7 commits

April 23, 2025 10:05


          refactor model_name to base class

134090e


          Refactor LightGBM test setup to use fixture for improved readability

83a239f


          Revert "Refactor LightGBM test setup to use fixture for improved read…

8e84a95

…ability"

This reverts commit 83a239f.


          Remove unnecessary blank lines in LightGBM class

dd28231


          Refactor LightGBM initialization to havebut not use x and y arguments…

7f3e73b

… and simplify Tuner model instantiation

Co-authored-by: Radka Jersakova <[email protected]>
Co-authored-by: Sam Greenbury <[email protected]>


          Fix formatting of unused arguments in LightGBM initializer

5b09b41


          todo commit

a90abbf

edwardchalstrey1 mentioned this pull request

Add AutoEmulate-specific interface for applying checks/validation that isn't just sklearn #419

Closed

Collaborator Author

edwardchalstrey1 commented Apr 23, 2025 •

edited

Loading

Final todos:

Finish adding type hints for __init__ in LightGBM and update docstring (see LightGBM docs)
See if can move y = y.ravel() # Ensure y is 1-dimensional into LightGBM fit method from the utils. Double check whether this makes sense, look at the docs. This dimension change shouldn't be part of the numpy conversion function.

edwardchalstrey1 added 5 commits

April 24, 2025 09:26


          Update n_jobs parameter in LightGBM initializer to allow None value

abe18fd


          add link to LGM docs in docstring

93f6e09


          Ensure y is 1-dimensional after tensor conversion in _convert_to_numpy

a827cad


          update docstring

e33385b


          Fix indentation in fit method docstring for clarity

ef8b272

edwardchalstrey1 commented

View reviewed changes

autoemulate/experimental/data/utils.py Outdated Show resolved Hide resolved


          handle y dimensionality check inside lightgbm

cf2042b

radka-j approved these changes

View reviewed changes

Member

radka-j left a comment

🚀

edwardchalstrey1 merged commit 35cccf8 into main

4 checks passed

edwardchalstrey1 deleted the 351-refactor-lgbm branch

April 24, 2025 10:32

radka-j mentioned this pull request

Refactor: Add validation functionality and class #422

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet