Use the eigenvalues to impose the standard Matérn prior for the GP #269

maresb · 2025-09-21T14:47:55Z

I'm not sure if it makes a difference, but we should be attenuating the coefficients in our GP according to the standard decay for the (in this case Matérn) GP prior. Namely, the variance should be the eigenvalue. @aseyboldt, for some reason this is running really slowly for me (also with main). Could you check if this works for you?

Before: all coefficients have std = 1 for elastic or std = 10 for coupling
After: coefficients have std = C sqrt(lambda), where C is chosen so that the top eigenvalue has the previous default std of 1 or 10.

aseyboldt · 2025-10-06T11:55:55Z

Sorry for the slow review, but this looks great, I think we can merge this.

review-notebook-app · 2025-10-17T19:49:01Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

maresb · 2025-10-17T20:01:31Z

Converted to draft because it contains a bunch of results in notebooks that I wouldn't want in the main branch...

maresb · 2025-10-17T21:20:15Z

Preliminary results

My hypothesis for this experiment was a slight difference in this direction. I'm pretty surprised about the major difference, especially with "strike slip coupling". Maybe the higher modes are indeed prior-dominated. I'm wondering if @aseyboldt has any thoughts.

What changed?

"Before" we had implemented a simple white noise prior, where each coefficient had a standard normal prior. Here I was suspicious about the prominence of the high eigenfunctions.

"After" we implemented a Matérn $\nu=\tfrac{1}{2}$ prior. This dampens the eigenfunction priors to be a normal with variance equal to the eigenvalue. In other words, it makes the plots look less wobbly.

🤓 details: The kernel we use to compute the KL modes is $\exp(-d/\ell)$, which is the Matérn kernel with $\nu=\tfrac{1}{2}$. In contrast, the white noise prior is effectively $\nu=-1$, so in various technical senses it has 1.5 derivatives less. Note that in the limit of infinitely many eigenfunctions, $\nu=\tfrac{1}{2}$ samples are almost surely $\alpha$-Hölder continuous for $\alpha<\tfrac{1}{2}$, which means that as you crank up the number of eigenfunctions, samples will be continuous, and be very close to satisfying $|f(x+\delta) - f(x)| < C \sqrt\delta$. (A differentiable function will basically by definition satisfy this with $<C \left|\delta\right|$.) In contrast, with white noise, things will diverge when you crank up the number of eigenfunctions.

It's a bit of an open question how to set the overall scale for the prior. In the code I made it so the top before/after eigenvalues agree, and those are hard-coded to 10 for coupling and 1 for elastic.

Samples from "before"

Source: "Run notebook without" (the Matérn prior) https://github.com/brendanjmeade/celeri/blob/461530e9a6ea93b12650fde83d56995dbee37eb3/notebooks/doc_solve_mcmc.ipynb

Samples from "after"

Source: "Run notebook with" (the Matérn prior) https://github.com/brendanjmeade/celeri/blob/f21cd2d9017c6f90bfec2906eccb4d830be0ae0a/notebooks/doc_solve_mcmc.ipynb

aseyboldt · 2025-10-18T09:25:00Z

I'm not sure yet, but to me that looks more like a bug than an actual difference due to the prior. How could the model possibly know that there are definitely no small scale variations? I haven't spotted a concrete problem in the code, but could it be related to how the eigenvectors are stored in celeri? Maybe we somehow ended up treating the eigenvectors of the strike slip component as higher frequency components and scale them down more?

…

On Fri, 17 Oct 2025, 23:20 Ben Mares, ***@***.***> wrote: *maresb* left a comment (brendanjmeade/celeri#269) <#269 (comment)> Preliminary results My hypothesis for this experiment was a slight difference in this direction. I'm pretty surprised about the major difference, especially with "strike slip". Maybe the higher modes are indeed prior-dominated. I'm wondering if @aseyboldt <https://github.com/aseyboldt> has any thoughts. What changed? "Before" we had implemented a simple white noise prior, where each coefficient had a standard normal prior. Here I was suspicious about the prominence of the high eigenfunctions. "After" we implemented a Matérn $\nu=\tfrac{1}{2}$ prior. This dampens the eigenfunction priors to be a normal with variance equal to the eigenvalue. In other words, it makes the plots look less wobbly. 🤓 details: The kernel we use to compute the KL modes is $\exp(-d/\ell)$, which is the Matérn kernel with $\nu=\tfrac{1}{2}$. In contrast, the white noise prior is effectively $\nu=-1$, so in various technical senses it has 1.5 derivatives less. Note that in the limit of infinitely many eigenfunctions, $\nu=\tfrac{1}{2}$ samples are almost surely $\alpha$-Hölder continuous for $\alpha<\tfrac{1}{2}$, which means that as you crank up the number of eigenfunctions, samples will be continuous, and be very close to satisfying $|f(x+\delta) - f(x)| < C \sqrt\delta$. (A differentiable function will basically by definition satisfy this with $<C \left|\delta\right|$.) In contrast, with white noise, things will diverge when you crank up the number of eigenfunctions. It's a bit of an open question how to set the overall scale for the prior. In the code I made it so the top before/after eigenvalues agree, and those are hard-coded to 10 for coupling and 1 for elastic. Samples from "before" Source: "Run notebook without" (the Matérn prior) https://github.com/brendanjmeade/celeri/blob/461530e9a6ea93b12650fde83d56995dbee37eb3/notebooks/doc_solve_mcmc.ipynb image.png (view on web) <https://github.com/user-attachments/assets/ee22eff6-640d-4a55-8550-ae2fb86e54d8> image.png (view on web) <https://github.com/user-attachments/assets/29fd08ea-eb2c-4739-b9db-ac68fc98f778> image.png (view on web) <https://github.com/user-attachments/assets/30d64279-7daf-489a-9472-0643f694fe2d> Samples from "after" Source: "Run notebook with" (the Matérn prior) https://github.com/brendanjmeade/celeri/blob/f21cd2d9017c6f90bfec2906eccb4d830be0ae0a/notebooks/doc_solve_mcmc.ipynb image.png (view on web) <https://github.com/user-attachments/assets/decb8ef4-d7cf-4938-b03c-b20ec17f1d3d> image.png (view on web) <https://github.com/user-attachments/assets/d1f6a164-17e2-4834-b92f-de254ecf9e2a> image.png (view on web) <https://github.com/user-attachments/assets/d83847d5-197b-4bde-80b7-99c0596feba5> — Reply to this email directly, view it on GitHub <#269 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOLSHOZYIKYVCHGGSHYVFL3YFMSJAVCNFSM6AAAAACHC57S3WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIMJXGI3TANJVHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

maresb · 2025-10-18T10:43:08Z

Ya, all the other results are consistent with what I expected: toning down the excessive "leopard spots". I'm really surprised though that the strike slip coupling looks so smooth.

maresb · 2025-10-20T12:44:27Z

It's a bit of an open question how to set the overall scale for the prior. In the code I made it so the top before/after eigenvalues agree, and those are hard-coded to 10 for coupling and 1 for elastic.

This was the problem. It's better to not to hard-code the scale of the top eigenvalue.

I'm still not very confident about how to choose the overall scale for the priors.

maresb · 2025-10-27T12:41:59Z

Let me outline how I plan to finish this out.

There are a lot of skeletons lurking in the closet with a GP like this. My goal is to bring them all out into the open, making them explicit, and ensure that we identify all sources of inaccuracy and make deliberate principled choices for all the parameters. I will provide a notebook that illustrates all the choices involved, and their effect on the priors. This way we can figure out which parameters are identifiable, and correspondingly fix parameter values or parameter priors according to domain knowledge, and enumerate all our assumptions. The relevant factors include:

The covariance kernel, e.g. "Matérn" or "squared exponential" (the smoothness $\nu\to\infty$ limit of Matérn). Celeri currently uses Matérn with $\nu=\tfrac{1}{2}$ which is probably not as smooth as desired, and we made the tentative decision to switch to "squared exponential". The kernel has two more parameters: the amplitude scale $\sigma$ and the correlation length scale $\ell$ which is a distance between two mesh elements.
The cutoff effects of using finitely many eigenmodes. This introduces a Nyquist length scale.
The triangulation. There are two main issues here:
1. There's yet another third distance scale for the resolution of the mesh: the distribution of distances between adjacent centroids
2. A weight must be chosen for each triangle. Currently celeri uses a constant weight, and this has the advantage of providing more detail along refined regions. The more geometrically principled approach is to weight by area, which plays nicely with continuum limits. To get the best of both worlds, we could default to weighting by area, and then when desired, have an optional "measure" factor on top of the area for detail control.

In addition to the above-mentioned distance-based kernels, there are also neighbor-based Laplacian kernels. One of the wishes was to check "distance-along-the-mesh" kernels, and the Laplacians are effectively both the best and easiest way to achieve this. (The naive distance-along-the-mesh kernels are not even positive-definite!!!) It's pretty straightforward to trade out the KL modes for the Laplacian modes, so this provides another "model selection" check.

When it comes to modeling the coupling constant, we enforce the constraint with a logit transform, and thus in logit space we need to set the amplitude scale, and also the mean! (Usually when discussing GPs we set the mean to zero.) The amplitude scale in logit space determines how sharply we transition between the 0 and 1 extremes. The mean determines our default guess for the coupling. Based on previous discussion, rather than using mean 0 in logit space, corresponding to default coupling of 0.5, we want the default coupling to be closer to 0.8. Also, we want to choose the amplitude so that coupling transitions gradually rather than sharply.

So what does this look like in the end? I see a module that takes a mesh and a bunch of configuration parameters and:

computes the eigenmodes according to the configuration parameters
given the additional non-covariance parameters, visualizes a bunch of prior draws on the mesh
computes and compares various analytic heuristics regarding effective correlation distance, Nyquist length, Weyl expansion, mesh properties, and identifies for example which length scale is dominant among correlation, Nyquist, or mesh.

brendanjmeade · 2025-10-28T19:25:32Z

@maresb and @aseyboldt It took me a while to read through this. Investigations here are smart so that we continue to bake in nice properties to the solution to improve the cavelier welded-together approach that I brought.

The covariance kernel, e.g. "Matérn" or "squared exponential" (the smoothness $\nu\to\infty$ limit of Matérn). Celeri currently uses Matérn with $\nu=\tfrac{1}{2}$ which is probably not as smooth as desired, and we made the tentative decision to switch to "squared exponential". The kernel has two more parameters: the amplitude scale $\sigma$ and the correlation length scale $\ell$ which is a distance between two mesh elements.

I'd like to understand the relative merits of these. Is there a theory that suggests the appropriateness of one form or another?

The triangulation. There are two main issues here:

There's yet another third distance scale for the resolution of the mesh: the distribution of distances between adjacent centroids

A weight must be chosen for each triangle. Currently celeri uses a constant weight, and this has the advantage of providing more detail along refined regions. The more geometrically principled approach is to weight by area, which plays nicely with continuum limits. To get the best of both worlds, we could default to weighting by area, and then when desired, have an optional "measure" factor on top of the area for detail control.

Very thoughtful and reasonable.

In addition to the above-mentioned distance-based kernels, there are also neighbor-based Laplacian kernels. One of the wishes was to check "distance-along-the-mesh" kernels, and the Laplacians are effectively both the best and easiest way to achieve this. (The naive distance-along-the-mesh kernels are not even positive-definite!!!) It's pretty straightforward to trade out the KL modes for the Laplacian modes, so this provides another "model selection" check.

I'm certainly open-minded about this. I found distance weighted eigenmodes, and they just seemed to work, so I went with it. Wise people have warned me off the geodesic distance!

When it comes to modeling the coupling constant, we enforce the constraint with a logit transform, and thus in logit space we need to set the amplitude scale, and also the mean! (Usually when discussing GPs we set the mean to zero.) The amplitude scale in logit space determines how sharply we transition between the 0 and 1 extremes. The mean determines our default guess for the coupling. Based on previous discussion, rather than using mean 0 in logit space, corresponding to default coupling of 0.5, we want the default coupling to be closer to 0.8. Also, we want to choose the amplitude so that coupling transitions gradually rather than sharply.

I agree with you on 0.8 as starting point. I'm less sure about how to tune the transitions.

maresb · 2025-10-29T19:41:52Z

Is there a theory that suggests the appropriateness of one form or another?

Short answer: if you're modeling stock prices, then $\nu=\tfrac{1}{2}$ is probably a good bet. If everything has a Taylor series, then $\nu=\infty$ is good. If things are "mostly smooth", then $\nu=\tfrac{3}{2}$ and $\nu=\tfrac{5}{2}$ are popular choices. (Half-integers are popular because the Bessel function drops out.) Due to a well-defined limit, there's not a huge difference between $\nu=\tfrac{5}{2}$ and $\nu=\infty$. Practically, there is even less difference due to the eigenmode cutoff.

Longer answer: there's a fairly elementary and straightforward intuition for thinking about this stuff, but all the literature I've seen totally butchers it. I'll send a notebook soon that will explain things and let you experiment with the parameters.

I found distance weighted eigenmodes, and they just seemed to work, so I went with it. Wise people have warned me off the geodesic distance!

To be clear, the Laplacian methods are not geodesic distance, but rather capture the notion of distance "along" the mesh (rather than ambient 3D distance) in the "right" way to avoid the pathologies of geodesic distance. They're firmly grounded in theory, and actually my favored tool.

I'm less sure about how to tune the transitions.

Me too, but we can lay out the various distance scales, figure out the analytic heuristics, and use this to calibrate to your expertise and make an informed choice.

brendanjmeade · 2025-10-30T01:17:04Z

@maresb

Short answer: if you're modeling stock prices, then ν = 1 2 is probably a good bet. If everything has a Taylor series, then ν = ∞ is good. If things are "mostly smooth", then ν = 3 2 and ν = 5 2 are popular choices. (Half-integers are popular because the Bessel function drops out.) Due to a well-defined limit, there's not a huge difference between ν = 5 2 and ν = ∞ . Practically, there is even less difference due to the eigenmode cutoff.

That helps. It's effectively a smoothing. Let's let this be a tuning parameter set in the *_config.json file.

To be clear, the Laplacian methods are not geodesic distance, but rather capture the notion of distance "along" the mesh (rather than ambient 3D distance) in the "right" way to avoid the pathologies of geodesic distance. They're firmly grounded in theory, and actually my favored tool.

I'm pretty sure I tried this with some meshing tool a long time ago. IIRC, it was a nightmare because it was highly sensitive to element geometry. We have a lot of meshes with occasional very thin and or highly scalene triangles. I want to spend 0% of my life cleaning up meshes. A huge advantage of the DWE approach is that it does not depend on distances rather than the full triangle geometry. With this in mind, I'd say that unless there's an easy win here, I'm going to say that we should pass on the Laplace modes because of the mesh geometry dependency.

Summary:

Implement Matern prior and allow for setting $\nu$ in a *_config.json file.
Do not pursue the Laplace modes because I think this is likely to become a mesh detail disaster zone.

I'm optimistic that these can be implemented cleanly. Thoughts?

…he GP" This reverts commit 1c8e92d38fb12da16f5403bf3751011a6e8dec3b.

…or for the GP"" This reverts commit 4985ae8.

github-actions · 2025-11-15T21:59:28Z

Hi @maresb,

main has updated pixi.lock since this branch was last synced. Merge or rebase the latest main so dependency metadata stays current.

Remote upstream should point at brendanjmeade/celeri; your fork is maresb/celeri.

git checkout matern-prior
git fetch upstream
git merge upstream/main

If merge conflicts occur in pixi.lock, accept the incoming changes from main, regenerate the lockfile, then continue the merge once everything is staged:

git checkout --theirs pixi.lock
pixi lock
git add pixi.lock
git merge --continue

This reminder will be minimized automatically once pixi.lock matches main.

Generated automatically by .github/workflows/lockfile-alert.yaml.

maresb requested a review from aseyboldt September 21, 2025 14:48

aseyboldt approved these changes Oct 6, 2025

View reviewed changes

maresb force-pushed the matern-prior branch from 89597c4 to 2c8e92d Compare October 6, 2025 14:06

maresb marked this pull request as draft October 17, 2025 20:00

maresb added 7 commits November 3, 2025 08:37

Use the eigenvalues to impose the standard Matérn prior for the GP

20061c7

Update MCMC trace saving to use write mode when exporting to Zarr

21dd420

run notebook with

028033b

Revert "Use the eigenvalues to impose the standard Matérn prior for t…

c05c83a

…he GP" This reverts commit 1c8e92d38fb12da16f5403bf3751011a6e8dec3b.

Run notebook without

8c46ab9

Revert "Revert "Use the eigenvalues to impose the standard Matérn pri…

50ab589

…or for the GP"" This reverts commit 4985ae8.

Add Zarr to .gitignore

9c8dd75

maresb force-pushed the matern-prior branch from 1319c57 to 9c8dd75 Compare November 6, 2025 15:03

Repository owner deleted a comment from brendanjmeade Nov 15, 2025

brendanjmeade mentioned this pull request Nov 25, 2025

Proposal for damping flags #286

Open

Use the eigenvalues to impose the standard Matérn prior for the GP #269

Are you sure you want to change the base?

Use the eigenvalues to impose the standard Matérn prior for the GP #269

Uh oh!

Conversation

maresb commented Sep 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aseyboldt commented Oct 6, 2025

Uh oh!

review-notebook-app bot commented Oct 17, 2025

Uh oh!

maresb commented Oct 17, 2025

Uh oh!

maresb commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Preliminary results

What changed?

Samples from "before"

Samples from "after"

Uh oh!

aseyboldt commented Oct 18, 2025 via email

Uh oh!

maresb commented Oct 18, 2025

Uh oh!

maresb commented Oct 20, 2025

Uh oh!

maresb commented Oct 27, 2025

Uh oh!

brendanjmeade commented Oct 28, 2025

Uh oh!

maresb commented Oct 29, 2025

Uh oh!

brendanjmeade commented Oct 30, 2025

Uh oh!

github-actions bot commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maresb commented Sep 21, 2025 •

edited

Loading

maresb commented Oct 17, 2025 •

edited

Loading

github-actions bot commented Nov 15, 2025 •

edited

Loading