Remove special 32-bit handling in doctests #40238

tobiasdiez · 2025-06-09T18:37:39Z

Remove the # 32/64-bit annotations in doctests, only leaving the output for 64-bit. To make the tests pass on 32-bit systems, the "known failure" mechanism is used to automatically hide the different output in this case.

This is needed for compatibility with pytest which doesn't support such doctest source manipulations.

📝 Checklist

The title is concise and informative.
The description explains in detail what this PR is about.
I have linked a relevant issue or discussion.
I have created tests covering the changes.
I have updated the documentation and checked the documentation preview.

⌛ Dependencies

github-actions · 2025-06-09T19:27:15Z

Documentation preview for this PR (built with commit 48a1e76; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

user202729 · 2025-06-09T22:42:49Z

The problem is… "very old" means "what @vbraun is running right now". And I don't think 30ish (rough estimation) counts as "very few".

vbraun · 2025-06-09T22:49:48Z

We still do have a 32-bit buildbot, running Debian 12.

Imho the reason for supporting is not that there are a lot of users, but its another platform that helps shake out bugs / random / undefined behavour.

tobiasdiez · 2025-06-10T07:37:18Z

I can see that testing it on a variety of systems gives greater confidence - but why does it need to run on 32-bit? Can you switch it to 64-bit Debian?

We also have it in the github CI https://github.com/sagemath/sage/actions/runs/15379005941/job/43267362166, so disabling that buildbot wouldn't be such a huge loss.

vbraun · 2025-06-10T07:40:58Z

We also do have a 64-bit buildbot running Debian 12, thats not the issue.

I repeat: the reason for supporting 32-bit is not that there are a lot of users, but its another platform that helps shake out bugs / random / undefined behavour.

tobiasdiez · 2025-06-10T09:55:01Z

I repeat: the reason for supporting 32-bit is not that there are a lot of users, but its another platform that helps shake out bugs / random / undefined behavour.

I'm not sure if I follow. Do you mean that testing 32-bit on CI will serve as a proxy for other platforms (say arm)? Is there anything particular to 32-bit Debian over using another 64-bit Linux for the purpose of shaking out bugs?

For the background: In #39207, I tried to implement the bit-parsing of doctests in a way that works with pytest - however this is rather tricky to get right etc. So having these 32-bit tests around does incur maintenance overhead - that's of course okay with me as long as these tests have some real value.

vbraun · 2025-06-10T17:01:24Z

Its not either/or, I'm testing 64bit x86, 32bit x86, and 64bit arm on the buildbot. IMHO having three platforms does provide value, having some architectural variety does catch bugs as I said.

tornaria · 2025-06-11T04:33:57Z

We support 32 bit on void Linux, so I also regularly check.

tornaria · 2025-06-11T04:37:37Z

Is there a "generic" way to support output differences in pytest? Besides 32/64-bit differences, it might also be useful to support different outputs on other conditions, e.g gap, singular, pari versions, availability of a feature flag, etc.

tobiasdiez · 2025-06-12T11:37:35Z

Its not either/or, I'm testing 64bit x86, 32bit x86, and 64bit arm on the buildbot. IMHO having three platforms does provide value, having some architectural variety does catch bugs as I said.

I'm not doubting that running the tests on 32bit does catch other bugs than on 64bit - what I was questioning/not understanding is if such tests literally just serve the purpose of testing 32bit systems or if they also serve as a proxy for other (less accessible) systems. In the former case, they would only provide value for the few people that actually run on 32bit.

We support 32 bit on void Linux, so I also regularly check.

That's good to know! Thanks!

Is there a "generic" way to support output differences in pytest? Besides 32/64-bit differences, it might also be useful to support different outputs on other conditions, e.g gap, singular, pari versions, availability of a feature flag, etc.

Pytest has very good support for different results on different systems (in terms of fixtures and markers) - but these features are limited to true pytests, not doctests. For doctests, pytest just uses the normal built-in doctest module which doesn't provide such facilities (if I'm not mistaken).

I've now used the "known test failures" mechanism to not report these few known differences between 32bit/64bit output. I've no way to test if this works though as I have no access to a 32bit system.

tobiasdiez · 2025-07-28T11:38:08Z

Could we please get this in? CI is green, and I hope the tests are passing on 32bit systems as well (but I have no means to test this)

dimpase · 2025-07-28T14:52:21Z

can this wait until we have an equivalent way of supporting this in pytest?

we can also think of maintaining a special patch one needs to apply to get doctests pass on a 32-bit system

vbraun · 2025-07-28T15:21:00Z

I don't understand what the upside of this PR is, I see only reduced test coverage. Obligatory meme:

Going for a more standard pytest is useful, of course. But it can be made to understand custom doctests flags, for example https://github.com/scientific-python/pytest-doctestplus registers a

>>> 1.0 / 3.0  # doctest: +FLOAT_CMP
0.333333333333333311

We also have custom floating point tolerance markers, those need to be supported as well right?

tobiasdiez · 2025-07-29T13:48:17Z

Pytest can understand other standard doctest flags, and actually already supports all other sage's custom flags. The only exception is the 32bit flag, which is implemented completely different by changing the doctest source. This, however, is not supported by pytest.

In #39207, I've tried to migrate the bitness flag to the output checker (similar to how the other flags are implemented). But this was not easily possible and broke a few other doctests. So instead of investing more time into supporting what appears to be a very niche configuration, I've went with the approach that decreases the test coverage very minimally (by something in the order of 1-2%) on those systems. I hope the senior dev agrees that this is a good usage of dev time ;-)

tobiasdiez · 2025-07-29T13:50:58Z

can this wait until we have an equivalent way of supporting this in pytest?

Those 32/64 bit differences is the only remaining systematic issue preventing a more widespread usage of pytest to run doctests.

we can also think of maintaining a special patch one needs to apply to get doctests pass on a 32-bit system

That could work as well. Eg just revert this PR on those systems ;-)

dimpase · 2025-07-29T16:20:26Z

I meant to say - go ahead with this PR, but have a patch to apply on the 32-bit systems to substitute the correct on 64-bit values with the correct on 32-bit values.

This patch could be applied automatically by bootstrap or configure.

user202729 · 2025-07-29T20:31:16Z

how are sage preparsing and handling of sage: custom prompt implemented in pytest then?

dimpase · 2025-07-29T20:42:03Z

how are sage preparsing and handling of sage: custom prompt implemented in pytest then?

this is working - I think the biggest issue is to test Cython files. Although perhaps this is now also taken care of? @tobiasdiez ?

user202729 · 2025-07-29T22:17:20Z

no, what I mean to ask is… in order to implement preparsing and custom prompt, surely you need the ability to rewrite doctest? Why does the same ability not work to rewrite the 32/64 bit?

vbraun · 2025-07-29T23:13:12Z

I also don't understand what the issue is, is there a technical reason or is it an "I don't want to do it"? There doesn't seem to be anything fundamentally different to parsing floating point precision.

IMHO the 32-bit differences are usually quite interesting and do tell you something about limitations of the code.

dimpase · 2025-07-30T06:40:39Z

I also don't understand what the issue is, is there a technical reason or is it an "I don't want to do it"? There doesn't seem to be anything fundamentally different to parsing floating point precision.

IMHO the 32-bit differences are usually quite interesting and do tell you something about limitations of the code.

yes, but given that basically noone develops on a 32-system nowadays, getting data from these 32-bit runs is getting more and more difficult.

I believe that having it in regular doctests is asking for too much.
It's fine to maintain a special testsuite with these 32-bit doctests, but it should not get in the way of the normal development process - cause it basically has a mostly academic interest anno 2025.

The workflow now is

Volker puts the PR into "needs work" state due to a 32-bit doctest failure.
Developer: now what!?
Volker: on 32-bit it's not 42.42, it's 42.41
Developer adjusts the doctest to print 42.41 on 32-bit.

So what?

tobiasdiez · 2025-07-30T10:01:07Z

There are a few fundamental differences between those 32bit flags and normal doctest flags:

Normal flags appear on the line that is evaluated, while 32bit flags appear in the output
Normal flags (usually) only need to handle valid output, while 32bit flags may also appear in exceptions and warnings

Currently the 32bit flags are implemented by removing the non-matching line from the source, before they are even passed to the doctest parser (this is actually similar to Dima's proposed patch mechanism). This approach is not available for pytest. In principle, one should be able to move the comparison logic to the output checker, but I was not able to handle a few edge cases in #39207, most notable interactions with warnings/exceptions are tricky to handle correctly.

In addition, it is indeed very hard for devs on more recent systems to check these architecture dependent tests. For example, I don't have any means to test if my changes here are indeed working on 32bit systems... Neither are there any 32bit github actions to give more direct feedback.

IMHO the 32-bit differences are usually quite interesting and do tell you something about limitations of the code.

Can you give examples for this? It appears to me that the output on 64bit systems is correct, while on 32bit systems you get a wrong result; or sometimes (like with hashes) you just get a different result on 32bit systems but the difference is not really important.

vbraun · 2025-07-30T10:12:45Z

If its the difference between 42.42 and 42.41, then its likely that a different compiler / os will sooner or later also get that on 64 bit. So IMHO catching it early that the tolerance needs to be increased is a plus. Also you should have used # rel tol in that case, not # 32-bit

user202729 · 2025-07-30T10:25:28Z

Currently the 32bit flags are implemented by removing the non-matching line from the source, before they are even passed to the doctest parser (this is actually similar to Dima's proposed patch mechanism). This approach is not available for pytest.

alright, so what approach are available for pytest? in particular, how are you currently implementing the parsing of sage: / ....: and preparsing? (preparsing definitely requires modification of the source?)

(are the pytest custom doctester logic currently in the code base?)

tobiasdiez · 2025-07-30T11:01:58Z

If its the difference between 42.42 and 42.41, then its likely that a different compiler / os will sooner or later also get that on 64 bit. So IMHO catching it early that the tolerance needs to be increased is a plus. Also you should have used # rel tol in that case, not # 32-bit

A more representative example would be:

        sage: y^(2^30)
        Traceback (most recent call last):             # 32-bit
        OverflowError: exponent overflow (1073741824)  # 32-bit
        y^1073741824  # 64-bit

        sage: y^2^32
        OverflowError: Python int too large to convert to C unsigned long  # 32-bit
        OverflowError: exponent overflow (4294967296)  # 64-bit

What do you we learn from this, except that 32bit is not handling large exponents? And why is it important to check that the error message looks like it does on 32bit systems? What does this failure tells us about other systems? It seems very unlikely that any other 64 bit system will encounter the same limitation as a 32bit system and also throws an exponent overflow in this case.

alright, so what approach are available for pytest? in particular, how are you currently implementing the parsing of sage: / ....: and preparsing? (preparsing definitely requires modification of the source?)

(are the pytest custom doctester logic currently in the code base?)

It's reusing the same parser as the sage doctester:

sage/conftest.py

Line 54 in 32d441a

super().__init__(parser=SageDocTestParser(set(["sage"])))

But the bitness handler is not in the parser, but in "sources.py".

tornaria · 2025-07-30T13:52:12Z

Just a quick comment: the last example is not representative at all. A lot of mathematical software gives different (correct) answers depending on bitness, and we still want checks for those.

Not only 32/64 bit make differences in computations. Oftentimes, a different version of e.g. gap, pari, singular, etc, makes a different output, and we had to resort to ugly workarounds to handle more than one version of those libraries.

I think instead of removing the 32/64 bit output differences support, we should think about how to support other differences of this kind. This is IMO the only way to properly support a (reasonable) range of dependencies provided by the system (we cannot expect to have too strict version requirements, otherwise it means giving the back to distros)

At some point I wrote a proof of concept for this to support labels like "gap<4.12", etc. using the same approach as the 32/64 bitness labels (I never made a PR).

I completely agree with moving all doctesting to pytest (the more standard python tools we use, the better for maintainability), but NOT at the expense of losing important features of sage doctesting like this.

The fact that you don't have a 32 bit system to test, is not different from the fact that you may have gap 4.12 and not gap 4.11. Also, anybody with a x86_64 box also has potentially a 32 bit system (e.g. in a chroot or container).

orlitzky · 2025-07-30T18:27:10Z

A lot of these are easy to fix so that they work on all systems:

We never need to know the precise hash of an object. Instead of printing the hash of foo, half of these can be eliminated by changing the test to hash(foo) in ZZ.
Where there are two distinct integer outputs, we can do foo in [32bit_value, 64bit_value]
If we get some extra precision on 64-bit systems, stick a ... on the end of the 32-bit answer
We already have abs tol if the answer is going to be essentially correct but differ slightly on 32-bit.
If an exception will be raised on 32-bit, we can try to catch it and print the 64-bit answer instead so that there's only one expected output

If there are any truly difficult examples remaining after fixing those, we can replace the doctest with a (pytest) unit test that checks the 32-bit and 64-bit answers separately.

orlitzky · 2025-07-30T18:33:36Z

I think instead of removing the 32/64 bit output differences support, we should think about how to support other differences of this kind. This is IMO the only way to properly support a (reasonable) range of dependencies provided by the system (we cannot expect to have too strict version requirements, otherwise it means giving the back to distros)

At some point I wrote a proof of concept for this to support labels like "gap<4.12", etc. using the same approach as the 32/64 bitness labels (I never made a PR).

I completely agree with moving all doctesting to pytest (the more standard python tools we use, the better for maintainability), but NOT at the expense of losing important features of sage doctesting like this.

The fact that you don't have a 32 bit system to test, is not different from the fact that you may have gap 4.12 and not gap 4.11. Also, anybody with a x86_64 box also has potentially a 32 bit system (e.g. in a chroot or container).

Eventually we should do these sorts of tests in pytest where it is easy, instead of in doctests where we are comparing strings and writing our own custom tooling to work around the fact that we can't just write if/then/else. This may hurt the documentation a tiny bit (if we are talking about EXAMPLES and not TESTS), but this is mitigated by the fact that examples with tons of mysterious # tags are confusing as hell anyway.

user202729 · 2025-07-30T19:09:21Z

we can't just write if/then/else.

You can. In fact, the following procedure "always" work:

Write whatever you want in pytest
copy that to doctest, appending sage: to the first line and ....: to the remaining lines.

The tags were invented specifically because they're shorter and more descriptive than copy paste the checking code every time. Imagine if instead of # rel tol 1e-13 every place you want it you need to expected = 123; assert abs(… - expected) / expected <= 1e-13 (you can refactor it into assert_equal(…, expected, rel_tol=1e-13), but then it's roughly exactly the same as the status quo, modulo the fact that now you have to type the extra assert_equal)

user202729 · 2025-07-30T19:12:07Z

It's reusing the same parser as the sage doctester

I guess you can just modify def parse(...) to delete the 32/64 bits tag accordingly then?

orlitzky · 2025-07-30T20:05:11Z

we can't just write if/then/else.

You can. In fact, the following procedure "always" work:
1. Write whatever you want in pytest
2. copy that to doctest, appending `sage:` to the first line and `....:` to the remaining lines.

Ok, but for the doctest to test anything, you have to print (the same) string from each branch. Whereas in pytest you can easily assert(actual == expected) for two different values of expected. The doctest becomes ugly and confusing if you try to do it.

The tags were invented specifically because they're shorter and more descriptive than copy paste the checking code every time. Imagine if instead of # rel tol 1e-13 every place you want it you need to expected = 123; assert abs(… - expected) / expected <= 1e-13 (you can refactor it into assert_equal(…, expected, rel_tol=1e-13), but then it's roughly exactly the same as the status quo, modulo the fact that now you have to type the extra assert_equal)

The # long time tag affects the test framework and cannot easily be implemented any other way. # abs tol and # rel tol are regular expressions that affect every number in the output---they do something a lot more complicated than assert(abs(...)).

There are good reasons for some tags, but others (like 32-bit vs 64-bit) can be done easily in code. The main obstacle is that you have to print matching strings to appease the doctest framework. But this goes away if you convert the doctest to a unit test.

I don't think this,

sage: foo  # 32-bit
32-bit answer
sage: foo  # 64-bit
64-bit answer

is any more clear or efficient than

if 32_bit():
    assert(foo == "32-bit answer")
if 64_bit():
    assert(foo == "64-bit answer")

...especially if we are not expecting strings as the answer, say if one of them raises an error.

user202729 · 2025-07-30T20:16:00Z

Ok, but for the doctest to test anything, you have to print (the same) string from each branch. Whereas in pytest you can easily assert(actual == expected) for two different values of expected. The doctest becomes ugly and confusing if you try to do it.

Pytest:

def test_f():
    if condition1:
        assert f() == 1
    else:
        assert f() == 2

Doctest:

TESTS::

    sage: if condition1:
    ....:     assert f() == 1
    ....: else:
    ....:     assert f() == 2

The only difference is the sage: and ....:.

sage: foo  # 32-bit
32-bit answer
sage: foo  # 64-bit
64-bit answer

Not quite. The current code is

sage: foo
32-bit answer  # 32-bit
64-bit answer  # 64-bit

which avoids the duplication of foo.

But my discussion above was specifically when you say if/then/else.

tobiasdiez · 2025-07-31T11:43:00Z

A lot of these are easy to fix so that they work on all systems: ...

I like this, but lack the knowledge of the underlying mathematics in some cases to decide eg if the higher precision on 64bit should really be tested (or if a tolerance tag is appropriate). These steps can also easily done after this PR: once a test is changed to pass on 64+32 bit, one can simply remove the "known failure on 32bit systems" entry. Or do you think this should really be done as part of this PR?

Concerning the pytest implementation, @pytest.mark.skipif(is_32_bit) is nicer than those if/then/else constructions for pytest, see eg https://github.com/numpy/numpy/blob/a80542541130a2dc47eab2a6972019453cc0c518/numpy/random/tests/test_extending.py#L51-L54 for a "real life" example. They also easily generalize to other features like @tornaria mentions. In this case, one could have something like skipif(has_gap('<4')) to not run a test for older gap versions. Thus, I don't think we need a special doctest-solution for those feature/bitness tags but instead just use pytest a bit more.

(Sidenote: pytest upgrades the "assert" command to show better error messages in case of a failure. So all other things being equal, this would already favor the pytest-version)

vbraun · 2025-07-31T12:11:23Z

        sage: y^(2^30)
        Traceback (most recent call last):             # 32-bit
        OverflowError: exponent overflow (1073741824)  # 32-bit
        y^1073741824  # 64-bit

What do you we learn from this, except that 32bit is not handling large exponents?

That is one of the cases that I would call super-interesting, and just from looking at a single doctest I can infer a ton of information:

exponents are machine integers (also on 64-bit)
I now roughly know the exponent limit on my 64-bit machine
overflows are actually checked and don't wrap (naive unsigned machine int) or are undefined behavior (naive signed machine int)

orlitzky · 2025-07-31T12:28:08Z

A lot of these are easy to fix so that they work on all systems: ...

I like this, but lack the knowledge of the underlying mathematics in some cases to decide eg if the higher precision on 64bit should really be tested (or if a tolerance tag is appropriate). These steps can also easily done after this PR: once a test is changed to pass on 64+32 bit, one can simply remove the "known failure on 32bit systems" entry. Or do you think this should really be done as part of this PR?

I just think that, since many of these can be improved to avoid the issue entirely (particularly the hash() examples), it's an easy way to address the objection that useful doctests are being deleted.

If there are only a few left that need special handling, it becomes a lot easier to think about, or justify by punting it to another ticket with clear scope (e.g. "do something about these 5 tests").

tobiasdiez · 2026-01-31T23:00:24Z

Superseded by #41540

Remove special 32-bit handling in doctests

78a32cb

This was referenced Jun 9, 2025

Move bitness handling in doctest to the output checker #39207

Closed

Replace doctest runner by pytest in ci #36981

Draft

Fix doctests

0f1b9d4

tornaria closed this Jun 11, 2025

tornaria reopened this Jun 11, 2025

Ignore known test failures on 32bit systems

cd60f86

Merge branch 'develop' into bitness-remove

46e143c

github-actions bot added the s: needs review label Jun 26, 2025

tobiasdiez and others added 3 commits July 26, 2025 10:49

Merge branch 'develop' into bitness-remove

4758a62

Fix linter

5796bb2

Fix tests

7f3c83d

tobiasdiez requested review from dimpase and tornaria July 28, 2025 11:38

tobiasdiez and others added 5 commits August 2, 2025 19:24

Merge branch 'develop' into bitness-remove

5009707

Merge remote-tracking branch 'upstream/develop' into bitness-remove

5f3a41d

Merge branch 'develop' into bitness-remove

59cec82

Merge branch 'develop' into bitness-remove

4bc8617

Merge remote-tracking branch 'origin/develop' into bitness-remove

48a1e76

orlitzky mentioned this pull request Jan 17, 2026

Consolidate most 32- and 64-bit special cases in the doctests #41468

Merged

orlitzky mentioned this pull request Jan 27, 2026

Eliminate remaining 32- and 64-bit doctest output tags #41540

Open

tobiasdiez closed this Jan 31, 2026

github-actions bot removed the s: needs review label Jan 31, 2026

Uh oh!

Remove special 32-bit handling in doctests #40238

Remove special 32-bit handling in doctests #40238

Conversation

tobiasdiez commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Checklist

⌛ Dependencies

Uh oh!

github-actions bot commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

user202729 commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vbraun commented Jun 9, 2025

Uh oh!

tobiasdiez commented Jun 10, 2025

Uh oh!

vbraun commented Jun 10, 2025

Uh oh!

tobiasdiez commented Jun 10, 2025

Uh oh!

vbraun commented Jun 10, 2025

Uh oh!

tornaria commented Jun 11, 2025

Uh oh!

tornaria commented Jun 11, 2025

Uh oh!

tobiasdiez commented Jun 12, 2025

Uh oh!

tobiasdiez commented Jul 28, 2025

Uh oh!

dimpase commented Jul 28, 2025

Uh oh!

vbraun commented Jul 28, 2025

Uh oh!

tobiasdiez commented Jul 29, 2025

Uh oh!

tobiasdiez commented Jul 29, 2025

Uh oh!

dimpase commented Jul 29, 2025

Uh oh!

user202729 commented Jul 29, 2025

Uh oh!

dimpase commented Jul 29, 2025

Uh oh!

user202729 commented Jul 29, 2025

Uh oh!

vbraun commented Jul 29, 2025

Uh oh!

dimpase commented Jul 30, 2025

Uh oh!

tobiasdiez commented Jul 30, 2025

Uh oh!

vbraun commented Jul 30, 2025

Uh oh!

user202729 commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tobiasdiez commented Jul 30, 2025

Uh oh!

tornaria commented Jul 30, 2025

Uh oh!

orlitzky commented Jul 30, 2025

Uh oh!

orlitzky commented Jul 30, 2025

Uh oh!

user202729 commented Jul 30, 2025

Uh oh!

user202729 commented Jul 30, 2025

Uh oh!

orlitzky commented Jul 30, 2025

Uh oh!

user202729 commented Jul 30, 2025

Uh oh!

tobiasdiez commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vbraun commented Jul 31, 2025

tobiasdiez commented Jun 9, 2025 •

edited

Loading

github-actions bot commented Jun 9, 2025 •

edited

Loading

user202729 commented Jun 9, 2025 •

edited

Loading

user202729 commented Jul 30, 2025 •

edited

Loading

tobiasdiez commented Jul 31, 2025 •

edited

Loading