Decrease default value to prevent overflow in 32-bit. by johannahaffner · Pull Request #176 · patrick-kidger/lineax

johannahaffner · 2025-11-14T08:08:39Z

Fixes #175 (comment).

johannahaffner · 2025-11-14T08:09:58Z

Created a dev branch.

johannahaffner · 2025-11-14T08:18:25Z

We're failing pre-commit checks due to a new version of pyright, I'll fix this when I have time.

patrick-kidger · 2025-11-15T05:51:21Z

Should this be something like jnp.finfo(jnp.float32).max? If nothing else to document the choice here.

johannahaffner · 2025-11-15T11:34:37Z

Sure, I can add a comment! A jnp anything in a default value of a public function causes JAX to initialise upon import of the library, I'm not exactly sure if this would be triggered here as well and will check if that would be the case.

patrick-kidger · 2025-11-15T12:16:23Z

This one should be safe from JAX initialisation, I believe. :)

johannahaffner · 2025-11-15T18:09:07Z

Indeed it is. To address the pyright complaints, I've added ignore statements wherever I could not find an ergonomic way to use astype or cast.

Regarding our two_norm, it looks like pyright now disregards the Scalar return type and looks at what the function may return on all code paths, which leads to a union that includes a None, due to an assert False branch. Is that what pyright should be doing, seeing as assert False, if ever hit, cannot lead to a type-related bug? It probably has no concept of what happens in a branch other than checking what it may return (?), but I'm not sure why it only started picking up on this starting from the newest version.

johannahaffner · 2025-11-15T18:44:33Z

We're getting the test failure reported here, which seems to be due to numerical bad luck: #172

johannahaffner · 2025-11-15T20:27:00Z

...and another one in GMRES, although I don't see how that one could be due to the changes I made.

johannahaffner · 2025-11-26T00:36:51Z

@PTNobel could you take a look at this? We're getting test failures for LSMR from tests that involve the computation of a JVP. The generated matrices should have permissible condition numbers: we've previously set a cutoff for the condition number at 1e3, which is the default we use for solvers that do not normalise the equations, and conlim is 1e8 for LSMR.

Right now I'm decreasing condition numbers to see at which values these tests will pass, but that is a little empirical and unsatisfactory. Besides, if there is a numerical problem underneath this that we should catch, then this is maybe a good canary in the coal mine.

PTNobel · 2025-11-26T04:37:45Z

Happy to look at it. It is Thanksgiving here in the US, so I'm a bit busy the next few days. Feel free to ping me if it haven't finished in a week.

tests/helpers.py

+    elif isinstance(solver, lx.GMRES):
+        cond_cutoff = 900
+    elif isinstance(solver, lx.LSMR):
+        cond_cutoff = 800


johannahaffner · 2025-12-04T09:01:01Z

@PTNobel here is the requested ping ☺️

Hope you had a wonderful thanksgiving with your family!

johannahaffner · 2025-12-13T13:58:13Z

The LSMR test failures in vmap_jvp are really weird. I cannot reproduce them run-by-run locally - tests will sometimes fail, and sometimes won't. By default the seed is set using the built-in random, so it will be different each time (we do not specify EQX_GETKEY_SEED).

So everything is a bit heuristic here, and I only have some observations:

Anecdotally, failures seem to happen more often in CI (on Linux) than they do on my Mac
Printing + checking the result does seem to "improve" the success rate - this changes something along the chain of custody and I've seen before that this can affect numerical weirdness
It does not seem to come from the backward pass after all - at least things pass more often if throw=False is passed to the wrapped linear_solve in the body of the test definition

What I think is going on here is that some unlucky random streams cause things to fail here. Given that we do guard against high condition numbers, this is potentially a serious issue affecting the reliability of lx.LSMR, which seems to fail for condition numbers below the stated/promised conlim.

As a practical next step, I've reset tests/helpers to what we currently have in dev. This reduces the changes made in this PR down to prevention of overflow due to hyperparameter initialisation when using 32-bit. Since LSMR is already in main, this is a change worth making - and I think we should merge this and fix the JVP issues for LSMR in a different PR.
This then means that ahead of the next release, LSMR will require a second look. (Not sure when I can get around to this, someone taking this on would be greatly appreciated.)

johannahaffner · 2025-12-13T14:08:27Z

I also note that the CI passed in #182. While the initialisation of the max_steps value in LSMR should not contribute to these test failures, the altered value for minrbar could do so.

patrick-kidger · 2025-12-13T14:15:08Z

lineax/_solver/lsmr.py

+            if jnp.issubdtype(dtype, jnp.complexfloating):
+                real_dtype = jnp.finfo(dtype).dtype
+            else:
+                real_dtype = dtype


We have a complex_to_real_dtype function that can be used in place of this. (And I think the if statement here is unnecessary, the jnp.finfo trick should work for real dtypes too.)

johannahaffner · 2025-12-13T14:17:33Z

Just crossed my mind - do you think conlim should potentially depend on the dtype used?

johannahaffner · 2025-12-13T16:22:29Z

Just noting that the last CI run actually passed :D

johannahaffner changed the base branch from main to dev November 14, 2025 08:09

This was referenced Nov 21, 2025

LSMR init overflow, pt 2. #177

Closed

Sparse operators in Lineax: CSR/CSC and COO format #178

Closed

Update guidelines for installation of doc dependencies #179

Closed

johannahaffner force-pushed the lsmr-init-overflow branch from cba733f to 4f6a054 Compare November 25, 2025 22:41

johannahaffner commented Nov 26, 2025

View reviewed changes

tests/helpers.py Outdated

elif isinstance(solver, lx.GMRES):

cond_cutoff = 900

elif isinstance(solver, lx.LSMR):

cond_cutoff = 800

This comment was marked as outdated.

Sign in to view

This comment was marked as outdated.

Sign in to view

johannahaffner mentioned this pull request Nov 26, 2025

Preconditioner and initial guess for backward pass in Krylov solvers #181

Open

Johanna Haffner added 7 commits December 13, 2025 12:08

Decrease default value to prevent overflow in 32-bit.

71d1296

use maximum value of runtime dtype

60d654a

limit condition numbers for sensitive, iterative solvers

3c8df62

cond_cutoff -> 900 for GMRES, LSMR

fadd714

decrease condition number cutoff for LSMR

895fa23

implement safeguard for int overflow in 32 bit

daed0a4

add extra safeguard for complex dtypes

20ef068

johannahaffner force-pushed the lsmr-init-overflow branch from 0aaa51f to 20ef068 Compare December 13, 2025 11:18

Johanna Haffner added 2 commits December 13, 2025 12:37

fix typo

08ceabe

reset condition number check

9891748

patrick-kidger approved these changes Dec 13, 2025

View reviewed changes

simplify dtype conversion: reuse function from our misc module

d297209

johannahaffner merged commit dddc513 into patrick-kidger:dev Dec 13, 2025
1 check passed

johannahaffner deleted the lsmr-init-overflow branch December 13, 2025 16:22

Conversation

johannahaffner commented Nov 14, 2025

Uh oh!

johannahaffner commented Nov 14, 2025

Uh oh!

johannahaffner commented Nov 14, 2025

Uh oh!

patrick-kidger commented Nov 15, 2025

Uh oh!

johannahaffner commented Nov 15, 2025

Uh oh!

patrick-kidger commented Nov 15, 2025

Uh oh!

johannahaffner commented Nov 15, 2025

Uh oh!

johannahaffner commented Nov 15, 2025

Uh oh!

johannahaffner commented Nov 15, 2025

Uh oh!

johannahaffner commented Nov 26, 2025

Uh oh!

PTNobel commented Nov 26, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

johannahaffner commented Dec 4, 2025

Uh oh!

johannahaffner commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johannahaffner commented Dec 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrick-kidger Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

johannahaffner commented Dec 13, 2025

Uh oh!

Uh oh!

johannahaffner commented Dec 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

johannahaffner commented Dec 13, 2025 •

edited

Loading

johannahaffner commented Dec 13, 2025 •

edited

Loading