-
-
Notifications
You must be signed in to change notification settings - Fork 768
Remove special 32-bit handling in doctests #40238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Documentation preview for this PR (built with commit 48a1e76; changes) is ready! 🎉 |
|
The problem is… "very old" means "what @vbraun is running right now". And I don't think 30ish (rough estimation) counts as "very few". |
|
We still do have a 32-bit buildbot, running Debian 12. Imho the reason for supporting is not that there are a lot of users, but its another platform that helps shake out bugs / random / undefined behavour. |
|
I can see that testing it on a variety of systems gives greater confidence - but why does it need to run on 32-bit? Can you switch it to 64-bit Debian? We also have it in the github CI https://github.com/sagemath/sage/actions/runs/15379005941/job/43267362166, so disabling that buildbot wouldn't be such a huge loss. |
|
We also do have a 64-bit buildbot running Debian 12, thats not the issue. I repeat: the reason for supporting 32-bit is not that there are a lot of users, but its another platform that helps shake out bugs / random / undefined behavour. |
I'm not sure if I follow. Do you mean that testing 32-bit on CI will serve as a proxy for other platforms (say arm)? Is there anything particular to 32-bit Debian over using another 64-bit Linux for the purpose of shaking out bugs? For the background: In #39207, I tried to implement the bit-parsing of doctests in a way that works with pytest - however this is rather tricky to get right etc. So having these 32-bit tests around does incur maintenance overhead - that's of course okay with me as long as these tests have some real value. |
|
Its not either/or, I'm testing 64bit x86, 32bit x86, and 64bit arm on the buildbot. IMHO having three platforms does provide value, having some architectural variety does catch bugs as I said. |
|
We support 32 bit on void Linux, so I also regularly check. |
|
Is there a "generic" way to support output differences in pytest? Besides 32/64-bit differences, it might also be useful to support different outputs on other conditions, e.g gap, singular, pari versions, availability of a feature flag, etc. |
I'm not doubting that running the tests on 32bit does catch other bugs than on 64bit - what I was questioning/not understanding is if such tests literally just serve the purpose of testing 32bit systems or if they also serve as a proxy for other (less accessible) systems. In the former case, they would only provide value for the few people that actually run on 32bit.
That's good to know! Thanks!
Pytest has very good support for different results on different systems (in terms of fixtures and markers) - but these features are limited to true pytests, not doctests. For doctests, pytest just uses the normal built-in doctest module which doesn't provide such facilities (if I'm not mistaken). I've now used the "known test failures" mechanism to not report these few known differences between 32bit/64bit output. I've no way to test if this works though as I have no access to a 32bit system. |
|
Could we please get this in? CI is green, and I hope the tests are passing on 32bit systems as well (but I have no means to test this) |
|
can this wait until we have an equivalent way of supporting this in pytest? we can also think of maintaining a special patch one needs to apply to get doctests pass on a 32-bit system |
|
I don't understand what the upside of this PR is, I see only reduced test coverage. Obligatory meme: Going for a more standard pytest is useful, of course. But it can be made to understand custom doctests flags, for example https://github.com/scientific-python/pytest-doctestplus registers a We also have custom floating point tolerance markers, those need to be supported as well right? |
|
Pytest can understand other standard doctest flags, and actually already supports all other sage's custom flags. The only exception is the 32bit flag, which is implemented completely different by changing the doctest source. This, however, is not supported by pytest. In #39207, I've tried to migrate the bitness flag to the output checker (similar to how the other flags are implemented). But this was not easily possible and broke a few other doctests. So instead of investing more time into supporting what appears to be a very niche configuration, I've went with the approach that decreases the test coverage very minimally (by something in the order of 1-2%) on those systems. I hope the senior dev agrees that this is a good usage of dev time ;-) |
Those 32/64 bit differences is the only remaining systematic issue preventing a more widespread usage of pytest to run doctests.
That could work as well. Eg just revert this PR on those systems ;-) |
|
I meant to say - go ahead with this PR, but have a patch to apply on the 32-bit systems to substitute the correct on 64-bit values with the correct on 32-bit values. This patch could be applied automatically by bootstrap or configure. |
|
how are sage preparsing and handling of |
this is working - I think the biggest issue is to test Cython files. Although perhaps this is now also taken care of? @tobiasdiez ? |
|
no, what I mean to ask is… in order to implement preparsing and custom prompt, surely you need the ability to rewrite doctest? Why does the same ability not work to rewrite the 32/64 bit? |
|
I also don't understand what the issue is, is there a technical reason or is it an "I don't want to do it"? There doesn't seem to be anything fundamentally different to parsing floating point precision. IMHO the 32-bit differences are usually quite interesting and do tell you something about limitations of the code. |
yes, but given that basically noone develops on a 32-system nowadays, getting data from these 32-bit runs is getting more and more difficult. I believe that having it in regular doctests is asking for too much. The workflow now is
So what? |
|
There are a few fundamental differences between those 32bit flags and normal doctest flags:
Currently the 32bit flags are implemented by removing the non-matching line from the source, before they are even passed to the doctest parser (this is actually similar to Dima's proposed patch mechanism). This approach is not available for pytest. In principle, one should be able to move the comparison logic to the output checker, but I was not able to handle a few edge cases in #39207, most notable interactions with warnings/exceptions are tricky to handle correctly. In addition, it is indeed very hard for devs on more recent systems to check these architecture dependent tests. For example, I don't have any means to test if my changes here are indeed working on 32bit systems... Neither are there any 32bit github actions to give more direct feedback.
Can you give examples for this? It appears to me that the output on 64bit systems is correct, while on 32bit systems you get a wrong result; or sometimes (like with hashes) you just get a different result on 32bit systems but the difference is not really important. |
|
If its the difference between 42.42 and 42.41, then its likely that a different compiler / os will sooner or later also get that on 64 bit. So IMHO catching it early that the tolerance needs to be increased is a plus. Also you should have used |
alright, so what approach are available for pytest? in particular, how are you currently implementing the parsing of (are the pytest custom doctester logic currently in the code base?) |
A more representative example would be: What do you we learn from this, except that 32bit is not handling large exponents? And why is it important to check that the error message looks like it does on 32bit systems? What does this failure tells us about other systems? It seems very unlikely that any other 64 bit system will encounter the same limitation as a 32bit system and also throws an exponent overflow in this case.
It's reusing the same parser as the sage doctester: Line 54 in 32d441a
But the bitness handler is not in the parser, but in "sources.py". |
|
Just a quick comment: the last example is not representative at all. A lot of mathematical software gives different (correct) answers depending on bitness, and we still want checks for those. Not only 32/64 bit make differences in computations. Oftentimes, a different version of e.g. gap, pari, singular, etc, makes a different output, and we had to resort to ugly workarounds to handle more than one version of those libraries. I think instead of removing the 32/64 bit output differences support, we should think about how to support other differences of this kind. This is IMO the only way to properly support a (reasonable) range of dependencies provided by the system (we cannot expect to have too strict version requirements, otherwise it means giving the back to distros) At some point I wrote a proof of concept for this to support labels like "gap<4.12", etc. using the same approach as the 32/64 bitness labels (I never made a PR). I completely agree with moving all doctesting to pytest (the more standard python tools we use, the better for maintainability), but NOT at the expense of losing important features of sage doctesting like this. The fact that you don't have a 32 bit system to test, is not different from the fact that you may have gap 4.12 and not gap 4.11. Also, anybody with a x86_64 box also has potentially a 32 bit system (e.g. in a chroot or container). |
|
A lot of these are easy to fix so that they work on all systems:
If there are any truly difficult examples remaining after fixing those, we can replace the doctest with a (pytest) unit test that checks the 32-bit and 64-bit answers separately. |
Eventually we should do these sorts of tests in pytest where it is easy, instead of in doctests where we are comparing strings and writing our own custom tooling to work around the fact that we can't just write |
You can. In fact, the following procedure "always" work:
The tags were invented specifically because they're shorter and more descriptive than copy paste the checking code every time. Imagine if instead of |
I guess you can just modify |
Ok, but for the doctest to test anything, you have to print (the same) string from each branch. Whereas in pytest you can easily
The There are good reasons for some tags, but others (like 32-bit vs 64-bit) can be done easily in code. The main obstacle is that you have to print matching strings to appease the doctest framework. But this goes away if you convert the doctest to a unit test. I don't think this, is any more clear or efficient than ...especially if we are not expecting strings as the answer, say if one of them raises an error. |
Pytest: Doctest: The only difference is the Not quite. The current code is which avoids the duplication of But my discussion above was specifically when you say if/then/else. |
I like this, but lack the knowledge of the underlying mathematics in some cases to decide eg if the higher precision on 64bit should really be tested (or if a tolerance tag is appropriate). These steps can also easily done after this PR: once a test is changed to pass on 64+32 bit, one can simply remove the "known failure on 32bit systems" entry. Or do you think this should really be done as part of this PR? Concerning the pytest implementation, (Sidenote: pytest upgrades the "assert" command to show better error messages in case of a failure. So all other things being equal, this would already favor the pytest-version) |
That is one of the cases that I would call super-interesting, and just from looking at a single doctest I can infer a ton of information:
|
I just think that, since many of these can be improved to avoid the issue entirely (particularly the If there are only a few left that need special handling, it becomes a lot easier to think about, or justify by punting it to another ticket with clear scope (e.g. "do something about these 5 tests"). |
|
Superseded by #41540 |

Remove the
# 32/64-bitannotations in doctests, only leaving the output for 64-bit. To make the tests pass on 32-bit systems, the "known failure" mechanism is used to automatically hide the different output in this case.This is needed for compatibility with pytest which doesn't support such doctest source manipulations.
📝 Checklist
⌛ Dependencies