Best method for inference of many iid samples #1421

Dingel321 · 2025-03-10T15:32:56Z

Dingel321
Mar 10, 2025

Hello everybody,

I have a problem, in which I have many iid observations, which I need to combine to get a precise inference.
Probably I will have around 100 to 1000 ii samples. The observations are time series which are compressed for now with an embedding network. Right now im using NRE with MCMC sampling. But my problem here is that I quickly run out of GPU memory when I try to make inference on many iid observations.

Are there other methods which are more suitable in this case?

michaeldeistler · 2025-03-10T17:13:47Z

michaeldeistler
Mar 10, 2025
Maintainer

Hi!

NLE and NRE do have advantages for iid-observations:

During training, they need only a single simulation per parameter set. This typically makes them less simulation hungry and less prone to issues such as memory outage.

The downside of these methods is that sampling can become relatively slow, especially for many iid datapoints. I would probably still give it a shot with one of those two methods.

Hope this helps
Michael

0 replies

janfb · 2025-03-11T08:31:41Z

janfb
Mar 11, 2025
Maintainer

adding to the GPU memory problem: at inference time you would only need the net and the batch iid samples on the GPU, therefore I am surprised about the overflow.

Just an idea but you could try recreating the posterior object from scratch, e.g., given your already trained density_estimator (returned from calling train())

from sbi.inference import NLE
# use a new NLE object to build the posterior
posterior = NLE().build_posterior(density_estimator=density_estimator, prior=prior, mcmc_parameters=your_mcmc_params)
# then sample given batch of iid x_o
samples = posterior.sample((num_samples,), x=x_o)

does this help?

0 replies

Dingel321 · 2025-03-25T14:43:00Z

Dingel321
Mar 25, 2025
Author

Hi,

Thank you for the answers. I initially started with NRE since I needed to use an embedding network for my observation, and I had many ID data points. So, right now, my biggest limitation is the GPU memory during inference time. The whole pipeline gives reasonable results as long as I use a single observation. When I try to use five or more, I always run out of GPU memory.

I started to play around with mcmc and vi samplings, but for both, I ran into VRAM issues.
I also tried to use your advice and rebuild the posterior after training, but this also did not help.

Is there maybe somewhere a parameter for the batch size which could be tuned ?

1 reply

michaeldeistler Mar 25, 2025
Maintainer

Could you share more details on the simulator? What is the dimensionality of theta, and of x? What embedding net do you use? Ideally, could you create a minimal reproducible issue with

theta = torch.randn((num_sims, dim_theta))
x = torch.randn((num_sims, dim_x))
x_o = torch.randn((num_iid_samples, dim_x))

and share it with us (if the issue remains with this dummy simulator)?

Dingel321 · 2025-04-22T07:57:44Z

Dingel321
Apr 22, 2025
Author

Hello,

Thank you for your responses!

I created a minimal example that produces the same error. Currently, the simulator has two parameters, but it may be expanded to include between 15 and 30 parameters in the future. In this minimal example, the simulator generates only noise, while the actual simulator produces a time series of a similar dimension. I also simplified the embedding network, while the total number of trainable parameters is the same.

import torch.nn as nn
import pickle

from sbi import utils as utils
from sbi import analysis as analysis
from sbi.inference import SNRE, simulate_for_sbi
from sbi.utils.user_input_checks import process_simulator
from sbi.neural_nets import classifier_nn

def simulator(theta):
    return (torch.randn(5000) + theta[0] / theta[1])

prior = utils.BoxUniform(low=torch.tensor([-3, 1]), high=torch.tensor([3, 10]), device="cpu")

simulator = process_simulator(simulator, is_numpy_simulator=False, prior=prior)

theta, x = simulate_for_sbi(simulator, prior, num_simulations=5000, num_workers=4, simulation_batch_size=1)

class TemporalCNN(nn.Module):
    def __init__(self):
        super().__init__()

        self.sequential = nn.Sequential(
            nn.Conv1d(1, 16, kernel_size=3, stride=2),
            nn.MaxPool1d(3),
            nn.ReLU(),
            nn.Conv1d(16, 32, kernel_size=3, stride=2),
            nn.MaxPool1d(3),
            nn.ReLU(),
            nn.Conv1d(32, 64, kernel_size=3, stride=2),
            nn.MaxPool1d(3),
            nn.ReLU(),
            nn.Conv1d(64, 128, kernel_size=3, stride=2),
            nn.MaxPool1d(3),
            nn.ReLU(),
            nn.Flatten(),
        )

    def forward(self, x):
        x = x.unsqueeze(1)
        x = self.sequential(x)
        return x
    
cnn = TemporalCNN()

prior = utils.BoxUniform(low=torch.tensor([-3, 1]), high=torch.tensor([3, 10]), device="cuda")

neural_ratio = classifier_nn(
    model="resnet",
    embedding_net_x=cnn,
    hidden_features=32,
) 
inference = SNRE(prior=prior, classifier=neural_ratio, device="cuda")

density_estimator = inference.append_simulations(theta, x, data_device='cpu').train(
    show_train_summary=True,
    training_batch_size=32,
)

posterior = inference.build_posterior(density_estimator, sample_with="mcmc")

# save with pickle
with open('posterior.pkl', 'wb') as f:
    pickle.dump(posterior, f)
#restart notebook to clear memory

with open('posterior.pkl', 'rb') as f:
    posterior = pickle.load(f)

true_params = torch.tensor([[0.3, 5] for _ in range(15)])

observation = simulator(true_params)

samples = posterior.sample((10000,), x=observation.cuda()).cpu()

_ = analysis.pairplot(
    samples,
    figsize=(7,7),
    points=true_params, 
    points_offdiag={'markersize': 6},
    points_colors='r', 
)```

3 replies

michaeldeistler Apr 22, 2025
Maintainer

Thanks for providing this! On my Macbook (did not test GPU), it failed during .sample() with memory issues. It seems to fail already while finding initializations for the MCMC sampler. The following works for me:

samples = posterior.sample(
    (10,),
    x=observation,
    num_chains=1,
    warmup_steps=10,
    init_strategy="proposal",
    method="slice_np_vectorized",
    init_strategy_parameters={"num_candidate_samples": 100},
)

Lowering num_candidate_samples will reduce memory, but it may cause issues with MCMC accuracy and may also lead to a very low MCMC acceptance rate (and therefore potentially leading to sampling taking forever).

Feel free to play around with any of these parameters to make things work for you, or to improve sampling speed.

I general, for a simulator with 30 parameters, I think that MCMC will be challenging. I don't think that it will work out of the box and will probably require hyperparameter tuning of the sampler. Alternatively, you could look into using variational inference as a sampler:

posterior = inference.build_posterior(density_estimator, sample_with="vi")

Dingel321 Apr 23, 2025
Author

Thanks for the quick answer!

I will play around with the MCMC parameters a bit more.
I also tried out the VI sampler, which works, but i always get this error when Im training the sampler:
TypeError: get_flow_builder.<locals>.build_fn() missing 2 required positional arguments: 'event_shape' and 'link_flow'
But i can still use the sampler afterwards.

michaeldeistler Apr 23, 2025
Maintainer

Hm this sounds like a bug. @manuelgloeckler any ideas?

Best method for inference of many iid samples #1421

Uh oh!

Dingel321 Mar 10, 2025

Replies: 4 comments · 4 replies

Uh oh!

michaeldeistler Mar 10, 2025 Maintainer

Uh oh!

Uh oh!

janfb Mar 11, 2025 Maintainer

Uh oh!

Dingel321 Mar 25, 2025 Author

Uh oh!

Uh oh!

michaeldeistler Mar 25, 2025 Maintainer

Uh oh!

Dingel321 Apr 22, 2025 Author

Uh oh!

Uh oh!

michaeldeistler Apr 22, 2025 Maintainer

Uh oh!

Dingel321 Apr 23, 2025 Author

Uh oh!

michaeldeistler Apr 23, 2025 Maintainer

Dingel321
Mar 10, 2025

Replies: 4 comments 4 replies

michaeldeistler
Mar 10, 2025
Maintainer

janfb
Mar 11, 2025
Maintainer

Dingel321
Mar 25, 2025
Author

michaeldeistler Mar 25, 2025
Maintainer

Dingel321
Apr 22, 2025
Author

michaeldeistler Apr 22, 2025
Maintainer

Dingel321 Apr 23, 2025
Author

michaeldeistler Apr 23, 2025
Maintainer