Hoeffding races implementation #1656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

slach31 wants to merge 27 commits into online-ml:main from slach31:hoeffding_races

slach31 commented Nov 30, 2024

No description provided.

W0lfgunbl00d and others added 27 commits

November 5, 2024 14:49


          v1test

6bf81df

implemented the rls, no tests yet


          Update rls.py

6aa4869

comments


          Added an v0 adpredictor

814fd8a


          Added an v0 adpredictor

b309747


          Ajouts fichier + test v0

27b55e9


          Merge branch 'online-ml:main' into hoeffding_races

a7528c3


          Modification precision des performances

411eb98


          Fixed bugs + remove hoeffding_races_test.py

b4b0b0e


          removed te.py file

66ac5fb


          Fixed imports in hoeffding_races.py

89d42c8


          Added default value for models

95eb204


          Fixed tests

1d880f9


          Separating classifier and regression selection

27d8d02


          Ajout classe Regressor

b69b62e


          Delete river/base/Adpredictor_test.ipynb

d2946e7


          Delete river/base/Adpredictor.py

a32f48b


          Delete river/linear_model/rls.py

0e7d9d8


          Update __init__.py

d6695ac


          Update hoeffding_races.py

3dbcdd0


          Update __init__.py

39ebc9f


          test precommit

ecbaa35


          Merge branch 'online-ml:main' into hoeffding_races

e159250


          Added commentary + fixed bugs

432363d


          Fixed typo

1850eff


          Fixed issues pre-commit

286660b


          Merge branch 'online-ml:main' into hoeffding_races

38d818e


          fix errors pytest hoeffding_races

6bc1833

slach31 requested review from MaxHalford and smastelini as code owners

November 30, 2024 20:07

e10e3 reviewed

View reviewed changes

Contributor

e10e3 left a comment

Thank you for this PR!
Sorry for the delay before the review.

Your proposition looks good, I have a few questions to better understand the code.

Where does this implementation come from? Is it based on a paper, or maybe on an existing piece of code?
If this is an original work, can you explain its benefits?
If this is based on an existing work, please cite your sources.

setup.py

Contributor

e10e3 May 23, 2025

Where does this file come from? Is necessary to build River?

river/model_selection/hoeffding_races.py

+                      Computes the hoeffding bound according to n, the number of iterations done.
+                      """
+                      return math.sqrt((math.log(1 / self.delta)) / (2 * n))

Contributor

e10e3 May 23, 2025

The usual formula for the Hoeffding bound includes the numeric range of the variable (sometimes expressed as $B$ or $R$). Is there a reason why it's not used here?

river/model_selection/hoeffding_races.py

		return len(self.remaining_models) == 1


		class HoeffdingRaceRegressor(base.Regressor):

Contributor

e10e3 May 23, 2025

The regressor class is almost identical to the classifier. Have you considered factoring the common parts? (You don't have to do it, it's a suggestion)

river/model_selection/hoeffding_races.py

+                  def predict_one(self, x):
+                      if len(self.remaining_models) == 1:
+                          return self.models[list(self.remaining_models)[0]].predict_one(x)

Contributor

e10e3 May 23, 2025

Isn't self.remaining_models already a list? If so, you don't need to convert it to a list again.

river/model_selection/hoeffding_races.py

Comment on lines +90 to +93

+                  def predict_one(self, x):
+                      if len(self.remaining_models) == 1:
+                          return self.models[list(self.remaining_models)[0]].predict_one(x)
+                      return None

Contributor

e10e3 May 23, 2025

I see that your model selector will not make any prediction while it has not converged to one model. Is there a reason to do so?
Could the best model be used for prediction?
What happens is the model selector never converges, i.e. two or three models are always equivalent?

river/model_selection/hoeffding_races.py

+                      self.n = 0
+                      self.model_metrics = {name: metric.clone() for name in models.keys()}
+                      self.model_performance = {name: 0 for name in models.keys()}
+                      self.remaining_models = [i for i in models.keys()]

Contributor

e10e3 May 23, 2025

Suggested change

      
                    self.remaining_models = [i for i in models.keys()]
          
                    self.remaining_models = list(models.keys())

Would this list conversion be more readable/explicit?

river/model_selection/hoeffding_races.py

+                      return math.sqrt((math.log(1 / self.delta)) / (2 * n))
+                  def learn_one(self, x, y):
+                      best_perf = max(self.model_performance.values()) if self.n > 0 else 0

Contributor

e10e3 May 23, 2025

Regression metrics are better with decreasing values (we want to minimize the error). For regression, wouldn't one want to select the model with the smallest error as the best model?

Suggested change

      
                    best_perf = max(self.model_performance.values()) if self.n > 0 else 0
          
                    best_perf = min(self.model_performance.values()) if self.n > 0 else math.inf

river/model_selection/hoeffding_races.py

Comment on lines +61 to +62

		self.model_metrics = {name: metric.clone() for name in models.keys()}
		self.model_performance = {name: 0 for name in models.keys()}

Contributor

e10e3 May 23, 2025

What is the difference between self.model_metrics and self.model_performance? It looks to me like model_performance holds the inner value of the metrics and nothing else.

river/model_selection/hoeffding_races.py

+                  """
+                  HoeffdingRace-based model selection for Classification.
+                  Each models is associated to a performance (here its accuracy). When the model is considered too inaccurate by the hoeffding bound,

Contributor

e10e3 May 23, 2025

Suggested change

      
                Each models is associated to a performance (here its accuracy). When the model is considered too inaccurate by the hoeffding bound,
          
                Each model is associated to a performance (here its accuracy). When the model is considered too inaccurate by the Hoeffding bound,

river/model_selection/hoeffding_races.py

+                  """
+                  HoeffdingRace-based model selection for regression.
+                  Each models is associated to a performance (here its accuracy). When the model is considered too inaccurate by the hoeffding bound,

Contributor

e10e3 May 23, 2025

The metric for regression is an error, the term "accuracy" is only used for classification

Suggested change

      
                Each models is associated to a performance (here its accuracy). When the model is considered too inaccurate by the hoeffding bound,
          
                Each model is associated to a performance (here its error). When the model is considered too inaccurate by the Hoeffding bound,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

MaxHalford Awaiting requested review from MaxHalford MaxHalford is a code owner

smastelini Awaiting requested review from smastelini smastelini is a code owner

1 more reviewer

e10e3 e10e3 left review comments

Labels

None yet