Skip to content

Conversation

@slach31
Copy link

@slach31 slach31 commented Nov 30, 2024

No description provided.

Copy link
Contributor

@e10e3 e10e3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR!
Sorry for the delay before the review.

Your proposition looks good, I have a few questions to better understand the code.

Where does this implementation come from? Is it based on a paper, or maybe on an existing piece of code?
If this is an original work, can you explain its benefits?
If this is based on an existing work, please cite your sources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this file come from? Is necessary to build River?

Computes the hoeffding bound according to n, the number of iterations done.

"""
return math.sqrt((math.log(1 / self.delta)) / (2 * n))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usual formula for the Hoeffding bound includes the numeric range of the variable (sometimes expressed as $B$ or $R$). Is there a reason why it's not used here?

return len(self.remaining_models) == 1


class HoeffdingRaceRegressor(base.Regressor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regressor class is almost identical to the classifier. Have you considered factoring the common parts? (You don't have to do it, it's a suggestion)


def predict_one(self, x):
if len(self.remaining_models) == 1:
return self.models[list(self.remaining_models)[0]].predict_one(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't self.remaining_models already a list? If so, you don't need to convert it to a list again.

Comment on lines +90 to +93
def predict_one(self, x):
if len(self.remaining_models) == 1:
return self.models[list(self.remaining_models)[0]].predict_one(x)
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that your model selector will not make any prediction while it has not converged to one model. Is there a reason to do so?
Could the best model be used for prediction?
What happens is the model selector never converges, i.e. two or three models are always equivalent?

self.n = 0
self.model_metrics = {name: metric.clone() for name in models.keys()}
self.model_performance = {name: 0 for name in models.keys()}
self.remaining_models = [i for i in models.keys()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.remaining_models = [i for i in models.keys()]
self.remaining_models = list(models.keys())

Would this list conversion be more readable/explicit?

return math.sqrt((math.log(1 / self.delta)) / (2 * n))

def learn_one(self, x, y):
best_perf = max(self.model_performance.values()) if self.n > 0 else 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression metrics are better with decreasing values (we want to minimize the error). For regression, wouldn't one want to select the model with the smallest error as the best model?

Suggested change
best_perf = max(self.model_performance.values()) if self.n > 0 else 0
best_perf = min(self.model_performance.values()) if self.n > 0 else math.inf

Comment on lines +61 to +62
self.model_metrics = {name: metric.clone() for name in models.keys()}
self.model_performance = {name: 0 for name in models.keys()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between self.model_metrics and self.model_performance? It looks to me like model_performance holds the inner value of the metrics and nothing else.

"""
HoeffdingRace-based model selection for Classification.

Each models is associated to a performance (here its accuracy). When the model is considered too inaccurate by the hoeffding bound,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Each models is associated to a performance (here its accuracy). When the model is considered too inaccurate by the hoeffding bound,
Each model is associated to a performance (here its accuracy). When the model is considered too inaccurate by the Hoeffding bound,

"""
HoeffdingRace-based model selection for regression.

Each models is associated to a performance (here its accuracy). When the model is considered too inaccurate by the hoeffding bound,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metric for regression is an error, the term "accuracy" is only used for classification

Suggested change
Each models is associated to a performance (here its accuracy). When the model is considered too inaccurate by the hoeffding bound,
Each model is associated to a performance (here its error). When the model is considered too inaccurate by the Hoeffding bound,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants