-
Notifications
You must be signed in to change notification settings - Fork 10
Use the eigenvalues to impose the standard Matérn prior for the GP #269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Sorry for the slow review, but this looks great, I think we can merge this. |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
Converted to draft because it contains a bunch of results in notebooks that I wouldn't want in the |
Preliminary resultsMy hypothesis for this experiment was a slight difference in this direction. I'm pretty surprised about the major difference, especially with "strike slip coupling". Maybe the higher modes are indeed prior-dominated. I'm wondering if @aseyboldt has any thoughts. What changed?"Before" we had implemented a simple white noise prior, where each coefficient had a standard normal prior. Here I was suspicious about the prominence of the high eigenfunctions. "After" we implemented a Matérn 🤓 details: The kernel we use to compute the KL modes is It's a bit of an open question how to set the overall scale for the prior. In the code I made it so the top before/after eigenvalues agree, and those are hard-coded to 10 for coupling and 1 for elastic. Samples from "before"Source: "Run notebook without" (the Matérn prior) https://github.com/brendanjmeade/celeri/blob/461530e9a6ea93b12650fde83d56995dbee37eb3/notebooks/doc_solve_mcmc.ipynb
Samples from "after"Source: "Run notebook with" (the Matérn prior) https://github.com/brendanjmeade/celeri/blob/f21cd2d9017c6f90bfec2906eccb4d830be0ae0a/notebooks/doc_solve_mcmc.ipynb
|
|
I'm not sure yet, but to me that looks more like a bug than an actual
difference due to the prior. How could the model possibly know that there
are definitely no small scale variations?
I haven't spotted a concrete problem in the code, but could it be related
to how the eigenvectors are stored in celeri? Maybe we somehow ended up
treating the eigenvectors of the strike slip component as higher frequency
components and scale them down more?
…On Fri, 17 Oct 2025, 23:20 Ben Mares, ***@***.***> wrote:
*maresb* left a comment (brendanjmeade/celeri#269)
<#269 (comment)>
Preliminary results
My hypothesis for this experiment was a slight difference in this
direction. I'm pretty surprised about the major difference, especially with
"strike slip". Maybe the higher modes are indeed prior-dominated. I'm
wondering if @aseyboldt <https://github.com/aseyboldt> has any thoughts.
What changed?
"Before" we had implemented a simple white noise prior, where each
coefficient had a standard normal prior. Here I was suspicious about the
prominence of the high eigenfunctions.
"After" we implemented a Matérn $\nu=\tfrac{1}{2}$ prior. This dampens
the eigenfunction priors to be a normal with variance equal to the
eigenvalue. In other words, it makes the plots look less wobbly.
🤓 details: The kernel we use to compute the KL modes is $\exp(-d/\ell)$,
which is the Matérn kernel with $\nu=\tfrac{1}{2}$. In contrast, the
white noise prior is effectively $\nu=-1$, so in various technical senses
it has 1.5 derivatives less. Note that in the limit of infinitely many
eigenfunctions, $\nu=\tfrac{1}{2}$ samples are almost surely $\alpha$-Hölder
continuous for $\alpha<\tfrac{1}{2}$, which means that as you crank up
the number of eigenfunctions, samples will be continuous, and be very close
to satisfying $|f(x+\delta) - f(x)| < C \sqrt\delta$. (A
differentiable function will basically by definition satisfy this with $<C
\left|\delta\right|$.) In contrast, with white noise, things will diverge
when you crank up the number of eigenfunctions.
It's a bit of an open question how to set the overall scale for the prior.
In the code I made it so the top before/after eigenvalues agree, and those
are hard-coded to 10 for coupling and 1 for elastic.
Samples from "before"
Source: "Run notebook without" (the Matérn prior)
https://github.com/brendanjmeade/celeri/blob/461530e9a6ea93b12650fde83d56995dbee37eb3/notebooks/doc_solve_mcmc.ipynb
image.png (view on web)
<https://github.com/user-attachments/assets/ee22eff6-640d-4a55-8550-ae2fb86e54d8> image.png
(view on web)
<https://github.com/user-attachments/assets/29fd08ea-eb2c-4739-b9db-ac68fc98f778> image.png
(view on web)
<https://github.com/user-attachments/assets/30d64279-7daf-489a-9472-0643f694fe2d> Samples
from "after"
Source: "Run notebook with" (the Matérn prior)
https://github.com/brendanjmeade/celeri/blob/f21cd2d9017c6f90bfec2906eccb4d830be0ae0a/notebooks/doc_solve_mcmc.ipynb
image.png (view on web)
<https://github.com/user-attachments/assets/decb8ef4-d7cf-4938-b03c-b20ec17f1d3d> image.png
(view on web)
<https://github.com/user-attachments/assets/d1f6a164-17e2-4834-b92f-de254ecf9e2a> image.png
(view on web)
<https://github.com/user-attachments/assets/d83847d5-197b-4bde-80b7-99c0596feba5>
—
Reply to this email directly, view it on GitHub
<#269 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOLSHOZYIKYVCHGGSHYVFL3YFMSJAVCNFSM6AAAAACHC57S3WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIMJXGI3TANJVHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Ya, all the other results are consistent with what I expected: toning down the excessive "leopard spots". I'm really surprised though that the strike slip coupling looks so smooth. |
This was the problem. It's better to not to hard-code the scale of the top eigenvalue. I'm still not very confident about how to choose the overall scale for the priors. |
|
Let me outline how I plan to finish this out. There are a lot of skeletons lurking in the closet with a GP like this. My goal is to bring them all out into the open, making them explicit, and ensure that we identify all sources of inaccuracy and make deliberate principled choices for all the parameters. I will provide a notebook that illustrates all the choices involved, and their effect on the priors. This way we can figure out which parameters are identifiable, and correspondingly fix parameter values or parameter priors according to domain knowledge, and enumerate all our assumptions. The relevant factors include:
In addition to the above-mentioned distance-based kernels, there are also neighbor-based Laplacian kernels. One of the wishes was to check "distance-along-the-mesh" kernels, and the Laplacians are effectively both the best and easiest way to achieve this. (The naive distance-along-the-mesh kernels are not even positive-definite!!!) It's pretty straightforward to trade out the KL modes for the Laplacian modes, so this provides another "model selection" check. When it comes to modeling the coupling constant, we enforce the constraint with a logit transform, and thus in logit space we need to set the amplitude scale, and also the mean! (Usually when discussing GPs we set the mean to zero.) The amplitude scale in logit space determines how sharply we transition between the 0 and 1 extremes. The mean determines our default guess for the coupling. Based on previous discussion, rather than using mean 0 in logit space, corresponding to default coupling of 0.5, we want the default coupling to be closer to 0.8. Also, we want to choose the amplitude so that coupling transitions gradually rather than sharply. So what does this look like in the end? I see a module that takes a mesh and a bunch of configuration parameters and:
|
|
@maresb and @aseyboldt It took me a while to read through this. Investigations here are smart so that we continue to bake in nice properties to the solution to improve the cavelier welded-together approach that I brought.
I'd like to understand the relative merits of these. Is there a theory that suggests the appropriateness of one form or another?
Very thoughtful and reasonable.
I'm certainly open-minded about this. I found distance weighted eigenmodes, and they just seemed to work, so I went with it. Wise people have warned me off the geodesic distance!
I agree with you on 0.8 as starting point. I'm less sure about how to tune the transitions. |
Short answer: if you're modeling stock prices, then Longer answer: there's a fairly elementary and straightforward intuition for thinking about this stuff, but all the literature I've seen totally butchers it. I'll send a notebook soon that will explain things and let you experiment with the parameters.
To be clear, the Laplacian methods are not geodesic distance, but rather capture the notion of distance "along" the mesh (rather than ambient 3D distance) in the "right" way to avoid the pathologies of geodesic distance. They're firmly grounded in theory, and actually my favored tool.
Me too, but we can lay out the various distance scales, figure out the analytic heuristics, and use this to calibrate to your expertise and make an informed choice. |
That helps. It's effectively a smoothing. Let's let this be a tuning parameter set in the
I'm pretty sure I tried this with some meshing tool a long time ago. IIRC, it was a nightmare because it was highly sensitive to element geometry. We have a lot of meshes with occasional very thin and or highly scalene triangles. I want to spend 0% of my life cleaning up meshes. A huge advantage of the DWE approach is that it does not depend on distances rather than the full triangle geometry. With this in mind, I'd say that unless there's an easy win here, I'm going to say that we should pass on the Laplace modes because of the mesh geometry dependency. Summary:
I'm optimistic that these can be implemented cleanly. Thoughts? |
…he GP" This reverts commit 1c8e92d38fb12da16f5403bf3751011a6e8dec3b.
…or for the GP"" This reverts commit 4985ae8.
|
Hi @maresb,
Remote git checkout matern-prior
git fetch upstream
git merge upstream/mainIf merge conflicts occur in git checkout --theirs pixi.lock
pixi lock
git add pixi.lock
git merge --continueThis reminder will be minimized automatically once Generated automatically by |






I'm not sure if it makes a difference, but we should be attenuating the coefficients in our GP according to the standard decay for the (in this case Matérn) GP prior. Namely, the variance should be the eigenvalue. @aseyboldt, for some reason this is running really slowly for me (also with
main). Could you check if this works for you?Before: all coefficients have std = 1 for elastic or std = 10 for coupling
After: coefficients have std = C sqrt(lambda), where C is chosen so that the top eigenvalue has the previous default std of 1 or 10.