Skip to content

Conversation

@maresb
Copy link
Collaborator

@maresb maresb commented Sep 21, 2025

I'm not sure if it makes a difference, but we should be attenuating the coefficients in our GP according to the standard decay for the (in this case Matérn) GP prior. Namely, the variance should be the eigenvalue. @aseyboldt, for some reason this is running really slowly for me (also with main). Could you check if this works for you?

Before: all coefficients have std = 1 for elastic or std = 10 for coupling
After: coefficients have std = C sqrt(lambda), where C is chosen so that the top eigenvalue has the previous default std of 1 or 10.

@maresb maresb requested a review from aseyboldt September 21, 2025 14:48
@aseyboldt
Copy link
Collaborator

Sorry for the slow review, but this looks great, I think we can merge this.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@maresb maresb marked this pull request as draft October 17, 2025 20:00
@maresb
Copy link
Collaborator Author

maresb commented Oct 17, 2025

Converted to draft because it contains a bunch of results in notebooks that I wouldn't want in the main branch...

@maresb
Copy link
Collaborator Author

maresb commented Oct 17, 2025

Preliminary results

My hypothesis for this experiment was a slight difference in this direction. I'm pretty surprised about the major difference, especially with "strike slip coupling". Maybe the higher modes are indeed prior-dominated. I'm wondering if @aseyboldt has any thoughts.

What changed?

"Before" we had implemented a simple white noise prior, where each coefficient had a standard normal prior. Here I was suspicious about the prominence of the high eigenfunctions.

"After" we implemented a Matérn $\nu=\tfrac{1}{2}$ prior. This dampens the eigenfunction priors to be a normal with variance equal to the eigenvalue. In other words, it makes the plots look less wobbly.

🤓 details: The kernel we use to compute the KL modes is $\exp(-d/\ell)$, which is the Matérn kernel with $\nu=\tfrac{1}{2}$. In contrast, the white noise prior is effectively $\nu=-1$, so in various technical senses it has 1.5 derivatives less. Note that in the limit of infinitely many eigenfunctions, $\nu=\tfrac{1}{2}$ samples are almost surely $\alpha$-Hölder continuous for $\alpha<\tfrac{1}{2}$, which means that as you crank up the number of eigenfunctions, samples will be continuous, and be very close to satisfying $|f(x+\delta) - f(x)| < C \sqrt\delta$. (A differentiable function will basically by definition satisfy this with $<C \left|\delta\right|$.) In contrast, with white noise, things will diverge when you crank up the number of eigenfunctions.

It's a bit of an open question how to set the overall scale for the prior. In the code I made it so the top before/after eigenvalues agree, and those are hard-coded to 10 for coupling and 1 for elastic.

Samples from "before"

Source: "Run notebook without" (the Matérn prior) https://github.com/brendanjmeade/celeri/blob/461530e9a6ea93b12650fde83d56995dbee37eb3/notebooks/doc_solve_mcmc.ipynb

image image image

Samples from "after"

Source: "Run notebook with" (the Matérn prior) https://github.com/brendanjmeade/celeri/blob/f21cd2d9017c6f90bfec2906eccb4d830be0ae0a/notebooks/doc_solve_mcmc.ipynb

image image image

@aseyboldt
Copy link
Collaborator

aseyboldt commented Oct 18, 2025 via email

@maresb
Copy link
Collaborator Author

maresb commented Oct 18, 2025

Ya, all the other results are consistent with what I expected: toning down the excessive "leopard spots". I'm really surprised though that the strike slip coupling looks so smooth.

@maresb
Copy link
Collaborator Author

maresb commented Oct 20, 2025

It's a bit of an open question how to set the overall scale for the prior. In the code I made it so the top before/after eigenvalues agree, and those are hard-coded to 10 for coupling and 1 for elastic.

This was the problem. It's better to not to hard-code the scale of the top eigenvalue.

I'm still not very confident about how to choose the overall scale for the priors.

@maresb
Copy link
Collaborator Author

maresb commented Oct 27, 2025

Let me outline how I plan to finish this out.

There are a lot of skeletons lurking in the closet with a GP like this. My goal is to bring them all out into the open, making them explicit, and ensure that we identify all sources of inaccuracy and make deliberate principled choices for all the parameters. I will provide a notebook that illustrates all the choices involved, and their effect on the priors. This way we can figure out which parameters are identifiable, and correspondingly fix parameter values or parameter priors according to domain knowledge, and enumerate all our assumptions. The relevant factors include:

  • The covariance kernel, e.g. "Matérn" or "squared exponential" (the smoothness $\nu\to\infty$ limit of Matérn). Celeri currently uses Matérn with $\nu=\tfrac{1}{2}$ which is probably not as smooth as desired, and we made the tentative decision to switch to "squared exponential". The kernel has two more parameters: the amplitude scale $\sigma$ and the correlation length scale $\ell$ which is a distance between two mesh elements.
  • The cutoff effects of using finitely many eigenmodes. This introduces a Nyquist length scale.
  • The triangulation. There are two main issues here:
    1. There's yet another third distance scale for the resolution of the mesh: the distribution of distances between adjacent centroids
    2. A weight must be chosen for each triangle. Currently celeri uses a constant weight, and this has the advantage of providing more detail along refined regions. The more geometrically principled approach is to weight by area, which plays nicely with continuum limits. To get the best of both worlds, we could default to weighting by area, and then when desired, have an optional "measure" factor on top of the area for detail control.

In addition to the above-mentioned distance-based kernels, there are also neighbor-based Laplacian kernels. One of the wishes was to check "distance-along-the-mesh" kernels, and the Laplacians are effectively both the best and easiest way to achieve this. (The naive distance-along-the-mesh kernels are not even positive-definite!!!) It's pretty straightforward to trade out the KL modes for the Laplacian modes, so this provides another "model selection" check.

When it comes to modeling the coupling constant, we enforce the constraint with a logit transform, and thus in logit space we need to set the amplitude scale, and also the mean! (Usually when discussing GPs we set the mean to zero.) The amplitude scale in logit space determines how sharply we transition between the 0 and 1 extremes. The mean determines our default guess for the coupling. Based on previous discussion, rather than using mean 0 in logit space, corresponding to default coupling of 0.5, we want the default coupling to be closer to 0.8. Also, we want to choose the amplitude so that coupling transitions gradually rather than sharply.

So what does this look like in the end? I see a module that takes a mesh and a bunch of configuration parameters and:

  • computes the eigenmodes according to the configuration parameters
  • given the additional non-covariance parameters, visualizes a bunch of prior draws on the mesh
  • computes and compares various analytic heuristics regarding effective correlation distance, Nyquist length, Weyl expansion, mesh properties, and identifies for example which length scale is dominant among correlation, Nyquist, or mesh.

@brendanjmeade
Copy link
Owner

@maresb and @aseyboldt It took me a while to read through this. Investigations here are smart so that we continue to bake in nice properties to the solution to improve the cavelier welded-together approach that I brought.

  • The covariance kernel, e.g. "Matérn" or "squared exponential" (the smoothness $\nu\to\infty$ limit of Matérn). Celeri currently uses Matérn with $\nu=\tfrac{1}{2}$ which is probably not as smooth as desired, and we made the tentative decision to switch to "squared exponential". The kernel has two more parameters: the amplitude scale $\sigma$ and the correlation length scale $\ell$ which is a distance between two mesh elements.

I'd like to understand the relative merits of these. Is there a theory that suggests the appropriateness of one form or another?

  • The triangulation. There are two main issues here:
    1. There's yet another third distance scale for the resolution of the mesh: the distribution of distances between adjacent centroids
    2. A weight must be chosen for each triangle. Currently celeri uses a constant weight, and this has the advantage of providing more detail along refined regions. The more geometrically principled approach is to weight by area, which plays nicely with continuum limits. To get the best of both worlds, we could default to weighting by area, and then when desired, have an optional "measure" factor on top of the area for detail control.

Very thoughtful and reasonable.

In addition to the above-mentioned distance-based kernels, there are also neighbor-based Laplacian kernels. One of the wishes was to check "distance-along-the-mesh" kernels, and the Laplacians are effectively both the best and easiest way to achieve this. (The naive distance-along-the-mesh kernels are not even positive-definite!!!) It's pretty straightforward to trade out the KL modes for the Laplacian modes, so this provides another "model selection" check.

I'm certainly open-minded about this. I found distance weighted eigenmodes, and they just seemed to work, so I went with it. Wise people have warned me off the geodesic distance!

When it comes to modeling the coupling constant, we enforce the constraint with a logit transform, and thus in logit space we need to set the amplitude scale, and also the mean! (Usually when discussing GPs we set the mean to zero.) The amplitude scale in logit space determines how sharply we transition between the 0 and 1 extremes. The mean determines our default guess for the coupling. Based on previous discussion, rather than using mean 0 in logit space, corresponding to default coupling of 0.5, we want the default coupling to be closer to 0.8. Also, we want to choose the amplitude so that coupling transitions gradually rather than sharply.

I agree with you on 0.8 as starting point. I'm less sure about how to tune the transitions.

@maresb
Copy link
Collaborator Author

maresb commented Oct 29, 2025

Is there a theory that suggests the appropriateness of one form or another?

Short answer: if you're modeling stock prices, then $\nu=\tfrac{1}{2}$ is probably a good bet. If everything has a Taylor series, then $\nu=\infty$ is good. If things are "mostly smooth", then $\nu=\tfrac{3}{2}$ and $\nu=\tfrac{5}{2}$ are popular choices. (Half-integers are popular because the Bessel function drops out.) Due to a well-defined limit, there's not a huge difference between $\nu=\tfrac{5}{2}$ and $\nu=\infty$. Practically, there is even less difference due to the eigenmode cutoff.

Longer answer: there's a fairly elementary and straightforward intuition for thinking about this stuff, but all the literature I've seen totally butchers it. I'll send a notebook soon that will explain things and let you experiment with the parameters.

I found distance weighted eigenmodes, and they just seemed to work, so I went with it. Wise people have warned me off the geodesic distance!

To be clear, the Laplacian methods are not geodesic distance, but rather capture the notion of distance "along" the mesh (rather than ambient 3D distance) in the "right" way to avoid the pathologies of geodesic distance. They're firmly grounded in theory, and actually my favored tool.

I'm less sure about how to tune the transitions.

Me too, but we can lay out the various distance scales, figure out the analytic heuristics, and use this to calibrate to your expertise and make an informed choice.

@brendanjmeade
Copy link
Owner

@maresb

Short answer: if you're modeling stock prices, then ν = 1 2 is probably a good bet. If everything has a Taylor series, then ν = ∞ is good. If things are "mostly smooth", then ν = 3 2 and ν = 5 2 are popular choices. (Half-integers are popular because the Bessel function drops out.) Due to a well-defined limit, there's not a huge difference between ν = 5 2 and ν = ∞ . Practically, there is even less difference due to the eigenmode cutoff.

That helps. It's effectively a smoothing. Let's let this be a tuning parameter set in the *_config.json file.

To be clear, the Laplacian methods are not geodesic distance, but rather capture the notion of distance "along" the mesh (rather than ambient 3D distance) in the "right" way to avoid the pathologies of geodesic distance. They're firmly grounded in theory, and actually my favored tool.

I'm pretty sure I tried this with some meshing tool a long time ago. IIRC, it was a nightmare because it was highly sensitive to element geometry. We have a lot of meshes with occasional very thin and or highly scalene triangles. I want to spend 0% of my life cleaning up meshes. A huge advantage of the DWE approach is that it does not depend on distances rather than the full triangle geometry. With this in mind, I'd say that unless there's an easy win here, I'm going to say that we should pass on the Laplace modes because of the mesh geometry dependency.

Summary:

  • Implement Matern prior and allow for setting $\nu$ in a *_config.json file.
  • Do not pursue the Laplace modes because I think this is likely to become a mesh detail disaster zone.

I'm optimistic that these can be implemented cleanly. Thoughts?

Repository owner deleted a comment from brendanjmeade Nov 15, 2025
@github-actions
Copy link

github-actions bot commented Nov 15, 2025

Hi @maresb,

main has updated pixi.lock since this branch was last synced. Merge or rebase the latest main so dependency metadata stays current.

Remote upstream should point at brendanjmeade/celeri; your fork is maresb/celeri.

git checkout matern-prior
git fetch upstream
git merge upstream/main

If merge conflicts occur in pixi.lock, accept the incoming changes from main, regenerate the lockfile, then continue the merge once everything is staged:

git checkout --theirs pixi.lock
pixi lock
git add pixi.lock
git merge --continue

This reminder will be minimized automatically once pixi.lock matches main.


Generated automatically by .github/workflows/lockfile-alert.yaml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants