alan-turing-institute
diff --git a/‎.all-contributorsrc‎
Lines changed: 9 additions & 0 deletions b/‎.all-contributorsrc‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎.github/workflows/ci.yaml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/ci.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/docs.yaml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/docs.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.github/workflows/github-repo-stats.yml‎
Lines changed: 34 additions & 0 deletions b/‎.github/workflows/github-repo-stats.yml‎
Lines changed: 34 additions & 0 deletions
diff --git a/‎.github/workflows/precommit.yaml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/precommit.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.pre-commit-config.yaml‎
Lines changed: 3 additions & 2 deletions b/‎.pre-commit-config.yaml‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎README.md‎
Lines changed: 5 additions & 0 deletions b/‎README.md‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎autoemulate/calibration/base.py‎
Lines changed: 256 additions & 0 deletions b/‎autoemulate/calibration/base.py‎
Lines changed: 256 additions & 0 deletions
@@ -256,6 +256,15 @@
       "contributions": [
         "bug"
       ]
+    },
+    {
+      "login": "ritkaarsingh30",
+      "name": "Ritkaar Singh",
+      "avatar_url": "https://avatars.githubusercontent.com/u/85431642?v=4",
+      "profile": "https://github.com/ritkaarsingh30",
+      "contributions": [
+        "doc"
+      ]
     }
   ]
 }
@@ -32,7 +32,7 @@ jobs:
       - name: Install dependencies
         run: |
           python -m pip install --upgrade pip
-          pip install -e .[dev]
+          pip install -e .[dev,spatiotemporal]
 
       - name: Test with pytest
         run: |
 
@@ -23,7 +23,7 @@ jobs:
             - name: "Install jupyterbook"
               run: pip install -r docs/requirements.txt
             - name: "Install autoemulate"
-              run: pip install git+https://github.com/alan-turing-institute/autoemulate.git
+              run: pip install -e .
             - name: "Run jupyterbook"
               run: jupyter-book build docs --all 
             - name: "Deploy"
 
@@ -0,0 +1,34 @@
+name: github-repo-stats-for-autoemulate
+
+# This workflow uses a github action to fetch GitHub repository stats
+# (traffic data, clones, views, referrers, popular content) for the
+# repository `alan-turing-institute/autoemulate` and to generate a
+# report file aswell as stats on the branch "github-repo-stats". 
+# There is a link to the stats at the top of the readme. 
+
+on:
+  schedule:
+    # Run this once per day, towards the end of the day for keeping the most
+    # recent data point most meaningful (hours are interpreted in UTC).
+    - cron: "0 23 * * *"
+  workflow_dispatch: # Allow for running this manually.
+
+jobs:
+  j1:
+    name: repostats-for-autoemulate
+    runs-on: ubuntu-latest
+    steps:
+      - name: run-ghrs
+        uses: jgehrcke/github-repo-stats@RELEASE
+        with:
+          # Define the stats repository (the repo to fetch
+          # stats for and to generate the report for).
+          # Remove the parameter when the stats repository
+          # and the data repository are the same.
+          repository: alan-turing-institute/autoemulate
+          # Set a GitHub API token that can read the GitHub
+          # repository traffic API for the stats repository,
+          # and that can push commits to the data repository
+          # (which this workflow file lives in, to store data
+          # and the report files).
+          ghtoken: ${{ secrets.ghrs_github_api_token }}
@@ -33,7 +33,7 @@ jobs:
               python -m venv .venv
               source .venv/bin/activate
               python -m pip install --upgrade pip
-              pip install -e .[dev]
+              pip install -e .[dev,spatiotemporal]
 
           - uses: pre-commit/[email protected]
             with:
 
@@ -1,7 +1,7 @@
 repos:
 - repo: https://github.com/astral-sh/ruff-pre-commit
   # Ruff version.
-  rev: v0.11.4
+  rev: v0.12.11
   hooks:
   # Run the linter.
   - id: ruff
@@ -13,11 +13,12 @@ repos:
     types_or: [ python, pyi ]
     files: ^autoemulate/|^tests/|^benchmarks/
 - repo: https://github.com/RobertCraigie/pyright-python
-  rev: v1.1.398
+  rev: v1.1.405
   hooks:
   - id: pyright
     files: ^autoemulate/|^tests/|^benchmarks/
 - repo: https://github.com/kynan/nbstripout
   rev: 0.8.1
   hooks:
   - id: nbstripout
+    exclude: ^case_studies/
@@ -5,10 +5,14 @@
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 [![All Contributors](https://img.shields.io/github/all-contributors/alan-turing-institute/autoemulate?color=ee8449&style=flat-square)](#contributors)
 [![Documentation](https://img.shields.io/badge/documentation-blue)](https://alan-turing-institute.github.io/autoemulate/)
+[![Github Stats](https://img.shields.io/badge/repostats-orange)](https://github.com/alan-turing-institute/autoemulate/blob/github-repo-stats/alan-turing-institute/autoemulate/latest-report/report.pdf)
+
 
 <!-- SPHINX-START -->
 Simulations of physical systems are often slow and need lots of compute, which makes them unpractical for real-world applications like digital twins, or when they have to run thousands of times for sensitivity analyses. The goal of `AutoEmulate` is to make it easy to replace simulations with fast, accurate emulators. To do this, `AutoEmulate` automatically fits and compares various emulators, ranging from simple models like Radial Basis Functions and Second Order Polynomials to more complex models like Support Vector Machines and  Gaussian Processes to find the best emulator for a simulation. 
 
+>[!WARNING]
+>Although AutoEmulate is currently on version 1.x, we are not following semantic versioning at the moment. The convention for V1 is that breaking and major changes will be made between minor version (1.1 -> 1.2). Bug fixes will be made in patch versions (1.1.1 -> 1.1.2). We plan to implement true semantic versioning in v2 of the package. We recommend pinning the minor version of AutoEmulate if using downstream and carefully reading release notes.
 
 ## Documentation
 
@@ -63,6 +67,7 @@ You can find the project documentation [here](https://alan-turing-institute.gith
     <tr>
       <td align="center" valign="top" width="14.28%"><a href="https://jvwilliams23.github.io"><img src="https://avatars.githubusercontent.com/u/48445365?v=4?s=100" width="100px;" alt="Josh Williams"/><br /><sub><b>Josh Williams</b></sub></a><br /><a href="#bug-jvwilliams23" title="Bug reports">🐛</a> <a href="#ideas-jvwilliams23" title="Ideas, Planning, & Feedback">🤔</a></td>
       <td align="center" valign="top" width="14.28%"><a href="https://github.com/LevanBokeria"><img src="https://avatars.githubusercontent.com/u/7816766?v=4?s=100" width="100px;" alt="Levan Bokeria"/><br /><sub><b>Levan Bokeria</b></sub></a><br /><a href="#bug-LevanBokeria" title="Bug reports">🐛</a></td>
+      <td align="center" valign="top" width="14.28%"><a href="https://github.com/ritkaarsingh30"><img src="https://avatars.githubusercontent.com/u/85431642?v=4?s=100" width="100px;" alt="Ritkaar Singh"/><br /><sub><b>Ritkaar Singh</b></sub></a><br /><a href="#doc-ritkaarsingh30" title="Documentation">📖</a></td>
     </tr>
   </tbody>
 </table>
 
@@ -0,0 +1,256 @@
+import logging
+from collections.abc import Callable
+
+import arviz as az
+import numpy as np
+from getdist import MCSamples
+from pyro.infer import HMC, MCMC, NUTS, Predictive
+from pyro.infer.mcmc import RandomWalkKernel
+
+from autoemulate.core.types import TensorLike
+
+
+class BayesianMixin:
+    """Mixin class for Bayesian calibration methods."""
+
+    logger: logging.Logger
+    model: Callable
+    observations: dict[str, TensorLike] | None
+
+    def _get_kernel(
+        self,
+        sampler: str,
+        model_kwargs: dict[str, TensorLike] | None = None,
+        **sampler_kwargs,
+    ):
+        """Get the appropriate MCMC kernel based on sampler choice."""
+        # TODO: consider how to pass model args, functools.partial?
+        model_kwargs = model_kwargs or {}
+        sampler = sampler.lower()
+        if sampler == "nuts":
+            self.logger.debug("Using NUTS kernel.")
+            return NUTS(self.model, **sampler_kwargs)
+        if sampler == "hmc":
+            step_size = sampler_kwargs.pop("step_size", 0.01)
+            trajectory_length = sampler_kwargs.pop("trajectory_length", 1.0)
+            self.logger.debug(
+                "Using HMC kernel with step_size=%s, trajectory_length=%s",
+                step_size,
+                trajectory_length,
+            )
+            return HMC(
+                self.model,
+                step_size=step_size,
+                trajectory_length=trajectory_length,
+                **sampler_kwargs,
+            )
+        if sampler == "metropolis":
+            self.logger.debug("Using Metropolis (RandomWalkKernel).")
+            return RandomWalkKernel(self.model, **sampler_kwargs)
+        self.logger.error("Unknown sampler: %s", sampler)
+        raise ValueError(f"Unknown sampler: {sampler}")
+
+    def run_mcmc(
+        self,
+        warmup_steps: int = 500,
+        num_samples: int = 1000,
+        num_chains: int = 1,
+        initial_params: dict[str, TensorLike] | None = None,
+        model_kwargs: dict | None = None,
+        sampler: str = "nuts",
+        **sampler_kwargs,
+    ) -> MCMC:
+        """
+        Run Markov Chain Monte Carlo (MCMC). Defaults to using the NUTS sampler.
+
+        Parameters
+        ----------
+        warmup_steps: int
+            Number of warm up steps to run per chain (i.e., burn-in). These samples are
+            discarded. Defaults to 500.
+        num_samples: int
+            Number of samples to draw after warm up. Defaults to 1000.
+        num_chains: int
+            Number of parallel chains to run. Defaults to 1.
+        initial_params: dict[str, TensorLike] | None
+            Optional dictionary specifiying initial values for each calibration
+            parameter per chain. The tensors must be of length `num_chains`.
+        model_kwargs: dict | None
+            Optional dictionary of keyword arguments to pass to the model.
+        sampler: str
+            The MCMC kernel to use, one of "hmc", "nuts" or "metropolis".
+        **sampler_kwargs
+            Additional keyword arguments to pass to the MCMC kernel.
+
+        Returns
+        -------
+        MCMC
+            The Pyro MCMC object. Methods include `summary()` and `get_samples()`.
+        """
+        # Check initial param values match number of chains
+
+        if initial_params is not None:
+            for param, init_vals in initial_params.items():
+                if init_vals.shape[0] != num_chains:
+                    msg = (
+                        "An initial value must be provided for each chain, parameter "
+                        f"{param} tensor only has {init_vals.shape[0]} values."
+                    )
+                    self.logger.error(msg)
+                    raise ValueError(msg)
+            self.logger.debug(
+                "Initial parameters provided for MCMC: %s", initial_params
+            )
+
+        # Run NUTS
+        kernel = self._get_kernel(sampler, model_kwargs=model_kwargs, **sampler_kwargs)
+        mcmc = MCMC(
+            kernel,
+            warmup_steps=warmup_steps,
+            num_samples=num_samples,
+            num_chains=num_chains,
+            # If None, init values are sampled from the prior.
+            initial_params=initial_params,
+            # Multiprocessing
+            mp_context="spawn" if num_chains > 1 else None,
+        )
+        self.logger.info("Starting MCMC run.")
+        mcmc.run()
+        self.logger.info("MCMC run completed.")
+        return mcmc
+
+    def posterior_predictive(self, mcmc: MCMC) -> dict[str, TensorLike]:
+        """
+        Return posterior predictive samples.
+
+        Parameters
+        ----------
+        mcmc: MCMC
+            The MCMC object.
+
+        Returns
+        -------
+        TensorLike
+            Tensor of posterior predictive samples [n_mcmc_samples, n_obs, n_outputs].
+        """
+        posterior_samples = mcmc.get_samples()
+        posterior_predictive = Predictive(self.model, posterior_samples)
+        samples = posterior_predictive(predict=True)
+        self.logger.debug("Posterior predictive samples generated.")
+        return samples
+
+    def to_arviz(
+        self, mcmc: MCMC, posterior_predictive: bool = False
+    ) -> az.InferenceData:
+        """
+        Convert MCMC object to Arviz InferenceData object for plotting.
+
+        Parameters
+        ----------
+        mcmc: MCMC
+            The MCMC object.
+        posterior_predictive: bool
+            Whether to include posterior predictive samples. Defaults to False.
+
+        Returns
+        -------
+        az.InferenceData
+        """
+        pp_samples = None
+        if posterior_predictive:
+            self.logger.info("Including posterior predictive samples in Arviz output.")
+            pp_samples = self.posterior_predictive(mcmc)
+
+        # Need to create dataset manually for Metropolis Hastings
+        # This is because az.from_pyro expects kernel with `divergences`
+        if isinstance(mcmc.kernel, RandomWalkKernel):
+            self.logger.debug(
+                "Using manual conversion for Metropolis (RandomWalkKernel) kernel."
+            )
+            if posterior_predictive:
+                if self.observations is None:
+                    msg = (
+                        "Observations must be provided to include observed_data in "
+                        "Arviz InferenceData."
+                    )
+                    self.logger.error(msg)
+                    raise ValueError(msg)
+                az_data = az.InferenceData(
+                    posterior=az.convert_to_dataset(
+                        mcmc.get_samples(group_by_chain=True)
+                    ),
+                    posterior_predictive=az.convert_to_dataset(pp_samples),
+                    observed_data=az.convert_to_dataset(self.observations),
+                )
+            else:
+                az_data = az.InferenceData(
+                    posterior=az.convert_to_dataset(
+                        mcmc.get_samples(group_by_chain=True)
+                    ),
+                )
+        else:
+            self.logger.debug("Using az.from_pyro for conversion.")
+            az_data = az.from_pyro(mcmc, posterior_predictive=pp_samples)
+
+        self.logger.info("Arviz InferenceData conversion complete.")
+        return az_data
+
+    @staticmethod
+    def to_getdist(
+        data: MCMC | az.InferenceData,
+        label: str,
+        use_weights: bool = True,
+        weight_name: str = "weight",
+    ) -> MCSamples:
+        """Convert Pyro MCMC or ArviZ InferenceData to GetDist MCSamples.
+
+        This lightweight helper extends the original implementation to also accept
+        SMC / other results already converted to ArviZ InferenceData. If a weight
+        variable (default: smc_weight) is present in sample_stats it will be
+        used as importance weights.
+
+        Parameters
+        ----------
+        data: MCMC | az.InferenceData
+            The Pyro MCMC object or an ArviZ InferenceData object containing posterior
+            samples.
+        label: str
+            Label for the MCSamples object.
+        use_weights: bool
+            If True and `data` is an `InferenceData` with `weight_name` in
+            `sample_stats` then those weights are applied. Defaults to True.
+        weight_name: str
+            Name of the weight variable inside `sample_stats` to look up.
+
+        Returns
+        -------
+        MCSamples
+            The GetDist MCSamples object.
+        """
+        if isinstance(data, MCMC):
+            samples_dict = data.get_samples()
+            arr = np.array(list(samples_dict.values())).T
+            names = list(samples_dict.keys())
+            weights = None
+        else:
+            posterior = data.posterior  # type: ignore[attr-defined]
+            names = list(posterior.data_vars)
+            cols = []
+            for name in names:
+                vals = np.asarray(posterior[name].values)
+                # Expect shape (chain, draw) for scalar parameters
+                if vals.ndim != 2:
+                    msg = (
+                        f"Posterior variable '{name}' has shape {vals.shape}; "
+                        "only scalar parameter sites (chain, draw) supported here."
+                    )
+                    raise ValueError(msg)
+                cols.append(vals.reshape(-1))
+            arr = np.vstack(cols).T  # (n_total_draws, n_params)
+            weights = None
+            sample_stats = getattr(data, "sample_stats", None)  # type: ignore[attr-defined]
+            if use_weights and sample_stats is not None and weight_name in sample_stats:
+                w = np.asarray(sample_stats[weight_name].values)
+                if w.ndim == 2:  # (chain, draw)
+                    weights = w.reshape(-1)
+        return MCSamples(samples=arr, names=names, label=label, weights=weights)
Original file line number	Diff line number	Diff line change
`@@ -256,6 +256,15 @@`
`256`	`256`	`"contributions": [`
`257`	`257`	`"bug"`
`258`	`258`	`]`
	`259`	`+ },`
	`260`	`+ {`
	`261`	`+ "login": "ritkaarsingh30",`
	`262`	`+ "name": "Ritkaar Singh",`
	`263`	`+ "avatar_url": "https://avatars.githubusercontent.com/u/85431642?v=4",`
	`264`	`+ "profile": "https://github.com/ritkaarsingh30",`
	`265`	`+ "contributions": [`
	`266`	`+ "doc"`
	`267`	`+ ]`
`259`	`268`	`}`
`260`	`269`	`]`
`261`	`270`	`}`