Description
🐛 Bug
botorch.models.utils.gpytorch_modules
implements a few utility functions that return kernels and likelihoods. Those functions should enforce constraints on kernel.lengthscale
and likelihood.noise
to make sure they are always positive.
However, the constraints do not work in some cases.
To reproduce
The following is a minimal working example showing that the parameter constraint does not work properly.
import torch
from botorch.models.utils.gpytorch_modules import get_gaussian_likelihood_with_gamma_prior
# By default the noise has a constraint GreaterThan(1.000E-04)
likelihood = get_gaussian_likelihood_with_gamma_prior()
# Let's say the gradient of the raw noise is 5 at some point during hyperparameter optimization
# In general, the gradient could be any values.
likelihood.raw_noise.grad = torch.tensor([5.])
# Do a single step gradient descent on the noise
with torch.no_grad():
for param in likelihood.parameters():
param -= param.grad
# tensor([-3.0000], requires_grad=True) violates the constraint!
print(likelihood.noise)
# Let's evaluate the log prior of the likelihood noise as in gpytorch.mlls._approximate_mll
# https://github.com/cornellius-gp/gpytorch/blob/8825cdd7abd1db7dea5803265067d598f21d6962/gpytorch/mlls/_approximate_mll.py#L70-L71
name, module, prior, closure, _ = next(likelihood.named_priors())
log_prior = prior.log_prob(closure(module))
print("name {:s}, log prior {:f}".format(name, log_prior.item()))
The following is the output. The log prior is NaN, because the noise is outside the support of Gamma distributions
Parameter containing:
tensor([-3.0000], requires_grad=True)
noise_covar.noise_prior, log prior nan
Expected Behavior
likelihood.noise
is supposed to be greater than1e-4
.log_prior
should not be NaN.
The above two should hold in all cases for any gradient values.
System information
- BoTorch Version v0.11.0
- GPyTorch Version v1.11
- PyTorch Version 2.4.0+cu121
- Fedora 40
Additional context
I am working on aepsych (heavily relies on botorch) where we use similar outputscale/lengthscale priors. I was fitting a GP model on a synthetic dataset and had NaN issues during hyperparameter optimization (I was using Adam). But those NaN issues might break LBFGS as well, e.g., line search failures.
Those NaN issues are due to the argument transform=None
.
botorch/botorch/models/utils/gpytorch_modules.py
Lines 63 to 71 in 8536468
GPyTorch implements the constraint by a softplus transformation. However, if we override the argument by transform=None
, then self.enforce=None
. As a result, no transformation is applied and the constraint is not enforced.
In most cases, the prior pushes the hyperparameter towards the mode of the prior distribution. Thus, this NaN issue does not happen very often. However, I believe this could lead to unintended behavior, e.g., line search failure and early termination of LBFGS.