Merge branch 'main' into refactor-tracker-interface

janfb · web-flow · commit ad06a85290ba · 2026-02-06T21:06:29.000+05:30
diff --git a/docs/how_to_guide.rst b/docs/how_to_guide.rst
@@ -46,6 +46,7 @@ Training
    how_to_guide/07_gpu_training.ipynb
    how_to_guide/07_save_and_load.ipynb
    how_to_guide/07_resume_training.ipynb
+   how_to_guide/21_hyperparameter_tuning.ipynb
    how_to_guide/22_experiment_tracking.ipynb
 
 
diff --git a/docs/how_to_guide/21_hyperparameter_tuning.ipynb b/docs/how_to_guide/21_hyperparameter_tuning.ipynb
@@ -0,0 +1,183 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "7fb27b941602401d91542211134fc71a",
+   "metadata": {},
+   "source": [
+    "# How to tune hyperparameters with Optuna"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "acae54e37e7d407bbb7b55eff062a284",
+   "metadata": {},
+   "source": [
+    "This guide shows a minimal `optuna` ([documentation](https://optuna.org/)) loop for hyperparameter\n",
+    "tuning in `sbi`. Optuna is a lightweight hyperparameter optimization library. You define\n",
+    "an objective function that trains a model (e.g., NPE) and returns a validation metric,\n",
+    "and Optuna runs multiple trials to explore the search space and track the best\n",
+    "configuration. As validation metric, we recommend using the negative log probability of\n",
+    "a held-out validation set `(theta, x)` under the current posterior estimate (see\n",
+    "Lueckmann et al. 2021 for details). \n",
+    "\n",
+    "Note that Optuna is not a dependency of `sbi`, you need to install it yourself in your\n",
+    "environment. \n",
+    "\n",
+    "Here, we use a toy simulator and do `NPE` with an embedding network built using the `posterior_nn` helper. We tune just two hyperparameters: the embedding dimension and the number of flow transforms in an `nsf` density estimator."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9a63283cbaf04dbcab1f6479b197f3a8",
+   "metadata": {},
+   "source": [
+    "## Setup a tiny simulation task"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3iwctp8e9hj",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import optuna\n",
+    "import torch\n",
+    "\n",
+    "from sbi.inference import NPE\n",
+    "from sbi.neural_nets import posterior_nn\n",
+    "from sbi.neural_nets.embedding_nets import FCEmbedding\n",
+    "from sbi.utils import BoxUniform\n",
+    "\n",
+    "torch.manual_seed(0)\n",
+    "\n",
+    "\n",
+    "def simulator(theta):\n",
+    "    return theta + 0.1 * torch.randn_like(theta)\n",
+    "\n",
+    "\n",
+    "prior = BoxUniform(low=-2 * torch.ones(2), high=2 * torch.ones(2))\n",
+    "\n",
+    "theta = prior.sample((6000,))\n",
+    "x = simulator(theta)\n",
+    "# Use a separate validation data set for optuna\n",
+    "theta_train, x_train = theta[:5000], x[:5000]\n",
+    "theta_val, x_val = theta[5000:], x[5000:]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "panj815v3nd",
+   "metadata": {},
+   "source": [
+    "## Define the Optuna objective\n",
+    "\n",
+    "Optuna expects the objective function to return a scalar value that it will optimize. When creating a study, you specify the optimization direction: `direction=\"minimize\"` to find the configuration with the lowest objective value, or `direction=\"maximize\"` for the highest. Here, we minimize the negative log probability (NLL) on a held-out validation set, so lower is better."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "gcmp410rk97",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def objective(trial):\n",
+    "    # Optuna will track these parameters internally.\n",
+    "    embedding_dim = trial.suggest_categorical(\"embedding_dim\", [16, 32, 64])\n",
+    "    num_transforms = trial.suggest_int(\"num_transforms\", 2, 6)\n",
+    "\n",
+    "    embedding_net = FCEmbedding(input_dim=x_train.shape[1], output_dim=embedding_dim)\n",
+    "    density_estimator = posterior_nn(\n",
+    "        model=\"nsf\",\n",
+    "        embedding_net=embedding_net,\n",
+    "        num_transforms=num_transforms,\n",
+    "    )\n",
+    "\n",
+    "    inference = NPE(prior=prior, density_estimator=density_estimator)\n",
+    "    inference.append_simulations(theta_train, x_train)\n",
+    "    estimator = inference.train(\n",
+    "        training_batch_size=128,\n",
+    "        show_train_summary=False,\n",
+    "    )\n",
+    "    posterior = inference.build_posterior(estimator)\n",
+    "\n",
+    "    with torch.no_grad():\n",
+    "        nll = -posterior.log_prob_batched(theta_val.unsqueeze(0), x=x_val).mean().item()\n",
+    "    # Return the metric to be optimized by Optuna.\n",
+    "    return nll"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "aad395b1",
+   "metadata": {},
+   "source": [
+    "## Run the study and retrain\n",
+    "\n",
+    "Optuna defaults to the TPE (Tree-structured Parzen Estimator) sampler, which is a good starting point for many experiments. TPE is a Bayesian optimization method that\n",
+    "models good vs. bad trials with nonparametric densities and samples new points\n",
+    "that are likely to improve the objective. You can swap in other samplers (random\n",
+    "search, Gaussian Process-based, etc.) by passing a different sampler instance to `create_study`.\n",
+    "\n",
+    "The TPE sampler uses `n_startup_trials` random trials to seed the model. With\n",
+    "`n_trials=25` and `n_startup_trials=10`, the first 10 trials are random and the\n",
+    "remaining 15 are guided by the acquisition function. If you want to ensure to start at\n",
+    "the default configuration, _enqueue_ it before optimization."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "qp1lf4lzzie",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sampler = optuna.samplers.TPESampler(n_startup_trials=10)\n",
+    "study = optuna.create_study(direction=\"minimize\", sampler=sampler)\n",
+    "# Optional: ensure the default config is evaluated\n",
+    "study.enqueue_trial({\"embedding_dim\": 32, \"num_transforms\": 4})\n",
+    "# This will run the above NPE training up to 25 times\n",
+    "study.optimize(objective, n_trials=25)\n",
+    "\n",
+    "best_params = study.best_params\n",
+    "embedding_net = FCEmbedding(\n",
+    "    input_dim=x_train.shape[1],\n",
+    "    output_dim=best_params[\"embedding_dim\"],\n",
+    ")\n",
+    "density_estimator = posterior_nn(\n",
+    "    model=\"nsf\",\n",
+    "    embedding_net=embedding_net,\n",
+    "    num_transforms=best_params[\"num_transforms\"],\n",
+    ")\n",
+    "\n",
+    "inference = NPE(prior=prior, density_estimator=density_estimator)\n",
+    "inference.append_simulations(theta, x)\n",
+    "final_estimator = inference.train(training_batch_size=128)\n",
+    "posterior = inference.build_posterior(final_estimator)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}