Skip to content

GSoC 2026 projects

Chris Fonnesbeck edited this page Jan 31, 2026 · 8 revisions

Getting started

New contributors should first read the contributing guide and learn the basics of PyTensor. Also they should read through some of the examples in the PyMC docs.

To be considered as a GSoC student, you should make a PR to PyMC / PyTensor. It can be something small, like a doc fix or simple bug fix. Some beginner friendly issues can be found here.

If you are a student interested in participating, please contact us via our Discourse site.

Projects

Below there is a list of possible topics for your GSoC project, we are also open to other topics, contact us on Discourse. Keep in mind that these are only ideas and that some of them can't be completely solved in a single GSoC project. When writing your proposal, choose some specific tasks and make sure your proposal is adequate for the GSoC time commitment. We expect all projects to be 350h projects, if you'd like to be considered for a 175h project you must reach out on Discourse. We will not accept 175h applications from people with whom we haven't discussed their time commitments before submitting the application.

  1. Spatial modeling
  2. Streaming inference
  3. Guide programs for Variational Inference
  4. Linear algebra rewrites
  5. Predictively Oriented Posteriors
  6. Survival Models

Spatial modeling

This project will build on previous GSoC projects to continue improving PyMCs support for modeling spatial processes. There are many possible algorithms one may choose to work on, such as Gaussian process based methods for point processes like Nearest Neighbor GPs or the Vecchia approximation, and models that are types of Gaussian Markov Random Fields, like CAR, ICAR and BYM models. Implementations of these can be found in the R package CARBayes and INLA.

Potential mentors:

  • Bill Engels
  • Chris Fonnesbeck

Info

  • Hours: 350
  • Expected outcome: An implementation of one or more of the methods listed above, along with one or more notebook examples that can be added to the PyMC docs demonstrating these techniques.
  • Skills required: Python, statistics, GPs
  • Difficulty: Medium

Streaming inference

This project works to extend the existing Minibatch functionality to support the streaming case. This would allow PyMC's Variational inference methods to be used on data larger than could fit in memory. This project would also work to introduce Minibatch support to all other inference methods in the library that would benefit from it, such as the recently introduced Pathfinder functionality.

We strongly suspect this project should integrate with [Dask] APIs, so prior knowledge on that would make help in this project.

Info

  • Hours: 350
  • Expected outcome: An improved Minibatch implementation for all inference methods that support it. A notebook demonstrating inference using a streaming data source.
  • Skills required: Python, Dask, Optimization
  • Difficulty: Medium

Potential mentors:

  • Chris Fonnesbeck
  • Rob Zinkov

Guide programs for Variational Inference

PyMC has support for Variational inference using blackbox methods which use a hardcoded guide program autogenerated for every model. It would be nice to give users the ability to write their own guide programs as is done in libraries like Pyro. This project would work to introduce a guide program module as well as generalising the existing inference algorithms to support them.

Info

  • Hours: 350
  • Expected outcome: A working implementation of guide programs for blackbox optimization using the ELBO as the loss. This should also include an example notebook showcasing the feature.
  • Skills required: Python, Variational Inference, Optimization
  • Difficulty: Hard

Potential mentors:

  • Rob Zinkov

Linear algebra rewrites

The COLA library implements several optimizations for speeding up linear algebra operations. This project would work to introduce these optimizations to pytensor as a collection of graph rewrites. This issue tracks the current state of this effort, but there is potential for massive speedups.

Info

  • Hours: 350
  • Expected outcome: The creation of a sizeable portion of these rewrites along with a notebook demonstrating the potential speedups they offer on typical pymc programs.
  • Skills required: Python, Linear Algebra
  • Difficulty: Medium

Potential mentors:

  • Jesse Grabowski
  • Rob Zinkov

Predictively Oriented Posteriors

Predictively oriented (PrO) posteriors express uncertainty as a consequence of predictive ability. The fundamental difference between the asymptotic behaviour of PrO posteriors and that of standard posteriors is that, while the latter will concentrate onto a point mass under very light assumptions, the former will only concentrate onto a point mass when the induced predictive is exactly the true data-generating process. In other words, they stabilise towards a predictively optimal posterior whose degree of irreducible uncertainty admits an interpretation as the degree of model misspecification. PrO posteriors can be sampled from by evolving particles based on mean field Langevin dynamics. The goal of this project is to implement a sampling engine that takes a PyMC (and or Bambi) model as input and returns a PrO posterior in a fully automatic way.

This project will require interacting with PyTensor, which is the backend used by PyMC. See https://www.pymc.io/projects/docs/en/v5.0.2/learn/core_notebooks/pymc_pytensor.html for more details.

Info

  • Hours: 350
  • Expected outcome: Support for marginalisation of Truncated distributions as well as finding closed form solutions for some conjugacy pairs.
  • Skills required: Python, Pytensor
  • Difficulty: Hard

Potential mentors:

  • Osvaldo Martin, Chris Fonnesbeck

Survival Models

This project aims to develop a high-level Python module for Bayesian survival analysis, leveraging the capabilities of PyMC and PyTensor to provide a declarative interface analogous to CausalPy in the causal inference domain. The primary intent is to streamline the implementation of time-to-event models by abstracting the complex tensor operations required for handling right, left, and interval censoring, as well as truncation, which currently require manual log-likelihood specifications using current Python modeling tools. The library will offer a high-level API to fit standard parametric estimators (e.g., Exponential, Weibull, Log-Normal) and semi-parametric specifications, such as the Piecewise Exponential and Cox Proportional Hazards models, facilitating robust inference without the need for custom model graph construction

Info

  • Hours: 350
  • Expected outcome: A draft library for building Bayesian survival models based on formula specification
  • Skills required: Python, survival analysis, PyTensor
  • Difficulty: Medium

Potential mentors:

  • Chris Fonnesbeck, Osvaldo Martin, Bill Engels

AI Tooling

We appreciate the utility of artificial intelligence (AI) tools in modern software development. However, we expect full disclosure of any AI tools used to assist the development of PyMC projects during Google Summer of Code.

Clone this wiki locally