Skip to content

[Bug] Parameter Constraints Do Not Work #2542

Open
@kayween

Description

🐛 Bug

botorch.models.utils.gpytorch_modules implements a few utility functions that return kernels and likelihoods. Those functions should enforce constraints on kernel.lengthscale and likelihood.noise to make sure they are always positive.

However, the constraints do not work in some cases.

To reproduce

The following is a minimal working example showing that the parameter constraint does not work properly.

import torch
from botorch.models.utils.gpytorch_modules import get_gaussian_likelihood_with_gamma_prior

# By default the noise has a constraint GreaterThan(1.000E-04)
likelihood = get_gaussian_likelihood_with_gamma_prior()

# Let's say the gradient of the raw noise is 5 at some point during hyperparameter optimization
# In general, the gradient could be any values.
likelihood.raw_noise.grad = torch.tensor([5.])

# Do a single step gradient descent on the noise
with torch.no_grad():
    for param in likelihood.parameters():
        param -= param.grad

# tensor([-3.0000], requires_grad=True) violates the constraint!
print(likelihood.noise)
 
# Let's evaluate the log prior of the likelihood noise as in gpytorch.mlls._approximate_mll
# https://github.com/cornellius-gp/gpytorch/blob/8825cdd7abd1db7dea5803265067d598f21d6962/gpytorch/mlls/_approximate_mll.py#L70-L71
name, module, prior, closure, _ = next(likelihood.named_priors())
log_prior = prior.log_prob(closure(module))

print("name {:s}, log prior {:f}".format(name, log_prior.item()))

The following is the output. The log prior is NaN, because the noise is outside the support of Gamma distributions

Parameter containing:
tensor([-3.0000], requires_grad=True)
noise_covar.noise_prior, log prior nan

Expected Behavior

  1. likelihood.noise is supposed to be greater than 1e-4.
  2. log_prior should not be NaN.

The above two should hold in all cases for any gradient values.

System information

  • BoTorch Version v0.11.0
  • GPyTorch Version v1.11
  • PyTorch Version 2.4.0+cu121
  • Fedora 40

Additional context

I am working on aepsych (heavily relies on botorch) where we use similar outputscale/lengthscale priors. I was fitting a GP model on a synthetic dataset and had NaN issues during hyperparameter optimization (I was using Adam). But those NaN issues might break LBFGS as well, e.g., line search failures.

Those NaN issues are due to the argument transform=None.

return GaussianLikelihood(
noise_prior=noise_prior,
batch_shape=batch_shape,
noise_constraint=GreaterThan(
MIN_INFERRED_NOISE_LEVEL,
transform=None,
initial_value=noise_prior_mode,
),
)

GPyTorch implements the constraint by a softplus transformation. However, if we override the argument by transform=None, then self.enforce=None. As a result, no transformation is applied and the constraint is not enforced.

https://github.com/cornellius-gp/gpytorch/blob/8825cdd7abd1db7dea5803265067d598f21d6962/gpytorch/constraints/constraints.py#L173-L175

In most cases, the prior pushes the hyperparameter towards the mode of the prior distribution. Thus, this NaN issue does not happen very often. However, I believe this could lead to unintended behavior, e.g., line search failure and early termination of LBFGS.

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions