Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ScaleKernel to get_covar_module_with_dim_scaled_prior #2619

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dai08srhg
Copy link

@dai08srhg dai08srhg commented Nov 10, 2024

Motivation

Since version 0.12.0, dim_scaled_lognormal_prior[Hvarfner2024vanilla] has become the default. However, as ScaleKernel is not applied in get_covar_module_with_dim_scaled_prior(), performance may deteriorate in some cases.
(The selection of the prior distribution depends on the task, but adjusting the scale seems beneficial and unproblematic across tasks.)

An example is shown below.
Run the task of finding the minimum of Styblinski-Tang (D=40) three times and compare the average performance.
StyblinskiTang40_performance-1
StyblinskiTang40_all-1

Have you read the Contributing Guidelines on pull requests?

I have read it.

Test Plan

Since this is a performance-related change, testing will involve a performance comparison on benchmark functions (such as Styblinski-Tang).

Related PRs

(If this PR adds or changes functionality, please take some time to update the docs at https://github.com/pytorch/botorch, and link to your PR here.)

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Nov 10, 2024
@Balandat
Copy link
Contributor

@hvarfner you have done some evaluations here - any thoughts? My understanding was that we didn't really see any benefit from using a scale kernel across a variety of functions if we standardize the outcomes.

@hvarfner
Copy link
Contributor

Hi @dai08srhg ,

Thanks for checking this out, and sorry for the late reply! When the new priors were implemented, the ScaleKernel was dropped after a lot of ablation. In fact, there are some cases where the outputscale is actually problematic.

The results in the paper (e.g. Fig. 19-22) shows that the performance was frequently substantially worse with a ScaleKernel. For high-dimensional problems, the outputscale parameter tends to shrink quite rapidly, leading to very local behavior - where worse perormance may follow. On some internal testing on mid- and high-dimensional problems, the performance with the inclusion of a ScaleKernel was generally not better, either.

Now, the shrinkage does not always happen, and is not always bad for performance. I re-ran your specific experiment, and also noticed that the ScaleKernel inclusion was slightly better --> stybtang40.pdf, but the difference I saw was not as stark (10 runs).

With that said, I think the effect that the outputscale (whether learned or not, when it shrinks and whether that is good or bad) is very interesting and would like to understand it better. However, we concluded that the inclusion of a ScaleKernel does more harm than good - both in terms of regret performance and the exploration-exploitation trade-off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants