Open
Description
What happened?
Hi @esantorella & @saitcakmak 👋🏼 After long time, I've finally had a moment to get back to #641 because I now have an actual minimal reproducing example.
For me, the process gets consistently killed after ~20 iterations. Until that point, it keeps allocating memory/swap and eventually crashes. Could you perhaps confirm if this is also the case for you?
Haven't yet checked if the gc.get_objects
method suggest by @esantorella here to verify if it's an actual leak or just over-allocation. But in any case, a crash is unexpected since the code obviously should not allocate any long-term resources for the independent optimizations happening in the loop.
Please provide a minimal, reproducible example of the unexpected behavior.
Adapted from the landing page code:
import torch
from botorch.acquisition import qNegIntegratedPosteriorVariance
from botorch.fit import fit_gpytorch_mll
from botorch.models import SingleTaskGP
from botorch.models.transforms import Normalize, Standardize
from botorch.optim import optimize_acqf
from gpytorch.mlls import ExactMarginalLogLikelihood
d = 10
train_X = torch.rand(100, d, dtype=torch.double)
mc_points = torch.rand(100, d, dtype=torch.double)
train_Y = torch.rand(100, 1, dtype=torch.double)
gp = SingleTaskGP(
train_X=train_X,
train_Y=train_Y,
input_transform=Normalize(d=d),
outcome_transform=Standardize(m=1),
)
mll = ExactMarginalLogLikelihood(gp.likelihood, gp)
fit_gpytorch_mll(mll)
acq = qNegIntegratedPosteriorVariance(model=gp, mc_points=mc_points)
bounds = torch.stack([torch.zeros(d), torch.ones(d)]).to(torch.double)
for i in range(1000):
candidate, acq_value = optimize_acqf(
acq, bounds=bounds, q=10, num_restarts=20, raw_samples=64, sequential=False
)
print(i)
Please paste any relevant traceback/logs produced by the example provided.
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[1] 63481 killed
BoTorch Version
0.12.0
Python Version
3.10
Operating System
macOS
Code of Conduct
- I agree to follow BoTorch's Code of Conduct
Activity