[Bug] Parameter Constraints Do Not Work

# 🐛 Bug

`botorch.models.utils.gpytorch_modules` implements a few utility functions that return kernels and likelihoods. Those functions should enforce constraints on `kernel.lengthscale` and `likelihood.noise` to make sure they are always positive.

However, the constraints do not work in some cases.

## To reproduce

The following is a minimal working example showing that the parameter constraint does not work properly.
```python
import torch
from botorch.models.utils.gpytorch_modules import get_gaussian_likelihood_with_gamma_prior

# By default the noise has a constraint GreaterThan(1.000E-04)
likelihood = get_gaussian_likelihood_with_gamma_prior()

# Let's say the gradient of the raw noise is 5 at some point during hyperparameter optimization
# In general, the gradient could be any values.
likelihood.raw_noise.grad = torch.tensor([5.])

# Do a single step gradient descent on the noise
with torch.no_grad():
    for param in likelihood.parameters():
        param -= param.grad

# tensor([-3.0000], requires_grad=True) violates the constraint!
print(likelihood.noise)
 
# Let's evaluate the log prior of the likelihood noise as in gpytorch.mlls._approximate_mll
# https://github.com/cornellius-gp/gpytorch/blob/8825cdd7abd1db7dea5803265067d598f21d6962/gpytorch/mlls/_approximate_mll.py#L70-L71
name, module, prior, closure, _ = next(likelihood.named_priors())
log_prior = prior.log_prob(closure(module))

print("name {:s}, log prior {:f}".format(name, log_prior.item()))
```

The following is the output. The log prior is NaN, because the noise is outside the support of Gamma distributions
```
Parameter containing:
tensor([-3.0000], requires_grad=True)
noise_covar.noise_prior, log prior nan
```

## Expected Behavior

1. `likelihood.noise` is supposed to be greater than `1e-4`.
2. `log_prior` should not be NaN.

The above two should hold in all cases for any gradient values.

## System information

- BoTorch Version v0.11.0
- GPyTorch Version v1.11
- PyTorch Version 2.4.0+cu121
- Fedora 40

## Additional context

I am working on aepsych (heavily relies on botorch) where we use similar outputscale/lengthscale priors. I was fitting a GP model on a synthetic dataset and had NaN issues during hyperparameter optimization (I was using Adam). But those NaN issues might break LBFGS as well, e.g., line search failures.

Those NaN issues are due to the argument `transform=None`.

https://github.com/pytorch/botorch/blob/85364681a84ac2276b423fe575f33db1e1df311b/botorch/models/utils/gpytorch_modules.py#L63-L71

GPyTorch implements the constraint by a softplus transformation. However, if we override the argument by `transform=None`, then `self.enforce=None`. As a result, no transformation is applied and the constraint is not enforced.

https://github.com/cornellius-gp/gpytorch/blob/8825cdd7abd1db7dea5803265067d598f21d6962/gpytorch/constraints/constraints.py#L173-L175

In most cases, the prior pushes the hyperparameter towards the mode of the prior distribution. Thus, this NaN issue does not happen very often. However, I believe this could lead to unintended behavior, e.g., line search failure and early termination of LBFGS.

	return GaussianLikelihood(
	noise_prior=noise_prior,
	batch_shape=batch_shape,
	noise_constraint=GreaterThan(
	MIN_INFERRED_NOISE_LEVEL,
	transform=None,
	initial_value=noise_prior_mode,
	),
	)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Parameter Constraints Do Not Work #2542

🐛 Bug

To reproduce

Expected Behavior

System information

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Parameter Constraints Do Not Work #2542

Description

🐛 Bug

To reproduce

Expected Behavior

System information

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions