Skip to content

Conversation

@tobiasdiez
Copy link
Contributor

@tobiasdiez tobiasdiez commented Jun 9, 2025

Remove the # 32/64-bit annotations in doctests, only leaving the output for 64-bit. To make the tests pass on 32-bit systems, the "known failure" mechanism is used to automatically hide the different output in this case.

This is needed for compatibility with pytest which doesn't support such doctest source manipulations.

📝 Checklist

  • The title is concise and informative.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.
  • I have created tests covering the changes.
  • I have updated the documentation and checked the documentation preview.

⌛ Dependencies

@github-actions
Copy link

github-actions bot commented Jun 9, 2025

Documentation preview for this PR (built with commit 48a1e76; changes) is ready! 🎉
This preview will update shortly after each push to this PR.

@user202729
Copy link
Contributor

user202729 commented Jun 9, 2025

The problem is… "very old" means "what @vbraun is running right now". And I don't think 30ish (rough estimation) counts as "very few".

@vbraun
Copy link
Member

vbraun commented Jun 9, 2025

We still do have a 32-bit buildbot, running Debian 12.

Imho the reason for supporting is not that there are a lot of users, but its another platform that helps shake out bugs / random / undefined behavour.

@tobiasdiez
Copy link
Contributor Author

I can see that testing it on a variety of systems gives greater confidence - but why does it need to run on 32-bit? Can you switch it to 64-bit Debian?

We also have it in the github CI https://github.com/sagemath/sage/actions/runs/15379005941/job/43267362166, so disabling that buildbot wouldn't be such a huge loss.

@vbraun
Copy link
Member

vbraun commented Jun 10, 2025

We also do have a 64-bit buildbot running Debian 12, thats not the issue.

I repeat: the reason for supporting 32-bit is not that there are a lot of users, but its another platform that helps shake out bugs / random / undefined behavour.

@tobiasdiez
Copy link
Contributor Author

I repeat: the reason for supporting 32-bit is not that there are a lot of users, but its another platform that helps shake out bugs / random / undefined behavour.

I'm not sure if I follow. Do you mean that testing 32-bit on CI will serve as a proxy for other platforms (say arm)? Is there anything particular to 32-bit Debian over using another 64-bit Linux for the purpose of shaking out bugs?

For the background: In #39207, I tried to implement the bit-parsing of doctests in a way that works with pytest - however this is rather tricky to get right etc. So having these 32-bit tests around does incur maintenance overhead - that's of course okay with me as long as these tests have some real value.

@vbraun
Copy link
Member

vbraun commented Jun 10, 2025

Its not either/or, I'm testing 64bit x86, 32bit x86, and 64bit arm on the buildbot. IMHO having three platforms does provide value, having some architectural variety does catch bugs as I said.

@tornaria
Copy link
Member

We support 32 bit on void Linux, so I also regularly check.

@tornaria tornaria closed this Jun 11, 2025
@tornaria tornaria reopened this Jun 11, 2025
@tornaria
Copy link
Member

Is there a "generic" way to support output differences in pytest? Besides 32/64-bit differences, it might also be useful to support different outputs on other conditions, e.g gap, singular, pari versions, availability of a feature flag, etc.

@tobiasdiez
Copy link
Contributor Author

Its not either/or, I'm testing 64bit x86, 32bit x86, and 64bit arm on the buildbot. IMHO having three platforms does provide value, having some architectural variety does catch bugs as I said.

I'm not doubting that running the tests on 32bit does catch other bugs than on 64bit - what I was questioning/not understanding is if such tests literally just serve the purpose of testing 32bit systems or if they also serve as a proxy for other (less accessible) systems. In the former case, they would only provide value for the few people that actually run on 32bit.

We support 32 bit on void Linux, so I also regularly check.

That's good to know! Thanks!

Is there a "generic" way to support output differences in pytest? Besides 32/64-bit differences, it might also be useful to support different outputs on other conditions, e.g gap, singular, pari versions, availability of a feature flag, etc.

Pytest has very good support for different results on different systems (in terms of fixtures and markers) - but these features are limited to true pytests, not doctests. For doctests, pytest just uses the normal built-in doctest module which doesn't provide such facilities (if I'm not mistaken).


I've now used the "known test failures" mechanism to not report these few known differences between 32bit/64bit output. I've no way to test if this works though as I have no access to a 32bit system.

@tobiasdiez
Copy link
Contributor Author

Could we please get this in? CI is green, and I hope the tests are passing on 32bit systems as well (but I have no means to test this)

@tobiasdiez tobiasdiez requested review from dimpase and tornaria July 28, 2025 11:38
@dimpase
Copy link
Member

dimpase commented Jul 28, 2025

can this wait until we have an equivalent way of supporting this in pytest?

we can also think of maintaining a special patch one needs to apply to get doctests pass on a 32-bit system

@vbraun
Copy link
Member

vbraun commented Jul 28, 2025

I don't understand what the upside of this PR is, I see only reduced test coverage. Obligatory meme:

hi288bt

Going for a more standard pytest is useful, of course. But it can be made to understand custom doctests flags, for example https://github.com/scientific-python/pytest-doctestplus registers a

>>> 1.0 / 3.0  # doctest: +FLOAT_CMP
0.333333333333333311

We also have custom floating point tolerance markers, those need to be supported as well right?

@tobiasdiez
Copy link
Contributor Author

Pytest can understand other standard doctest flags, and actually already supports all other sage's custom flags. The only exception is the 32bit flag, which is implemented completely different by changing the doctest source. This, however, is not supported by pytest.

In #39207, I've tried to migrate the bitness flag to the output checker (similar to how the other flags are implemented). But this was not easily possible and broke a few other doctests. So instead of investing more time into supporting what appears to be a very niche configuration, I've went with the approach that decreases the test coverage very minimally (by something in the order of 1-2%) on those systems. I hope the senior dev agrees that this is a good usage of dev time ;-)

@tobiasdiez
Copy link
Contributor Author

can this wait until we have an equivalent way of supporting this in pytest?

Those 32/64 bit differences is the only remaining systematic issue preventing a more widespread usage of pytest to run doctests.

we can also think of maintaining a special patch one needs to apply to get doctests pass on a 32-bit system

That could work as well. Eg just revert this PR on those systems ;-)

@dimpase
Copy link
Member

dimpase commented Jul 29, 2025

I meant to say - go ahead with this PR, but have a patch to apply on the 32-bit systems to substitute the correct on 64-bit values with the correct on 32-bit values.

This patch could be applied automatically by bootstrap or configure.

@user202729
Copy link
Contributor

how are sage preparsing and handling of sage: custom prompt implemented in pytest then?

@dimpase
Copy link
Member

dimpase commented Jul 29, 2025

how are sage preparsing and handling of sage: custom prompt implemented in pytest then?

this is working - I think the biggest issue is to test Cython files. Although perhaps this is now also taken care of? @tobiasdiez ?

@user202729
Copy link
Contributor

no, what I mean to ask is… in order to implement preparsing and custom prompt, surely you need the ability to rewrite doctest? Why does the same ability not work to rewrite the 32/64 bit?

@vbraun
Copy link
Member

vbraun commented Jul 29, 2025

I also don't understand what the issue is, is there a technical reason or is it an "I don't want to do it"? There doesn't seem to be anything fundamentally different to parsing floating point precision.

IMHO the 32-bit differences are usually quite interesting and do tell you something about limitations of the code.

@dimpase
Copy link
Member

dimpase commented Jul 30, 2025

I also don't understand what the issue is, is there a technical reason or is it an "I don't want to do it"? There doesn't seem to be anything fundamentally different to parsing floating point precision.

IMHO the 32-bit differences are usually quite interesting and do tell you something about limitations of the code.

yes, but given that basically noone develops on a 32-system nowadays, getting data from these 32-bit runs is getting more and more difficult.

I believe that having it in regular doctests is asking for too much.
It's fine to maintain a special testsuite with these 32-bit doctests, but it should not get in the way of the normal development process - cause it basically has a mostly academic interest anno 2025.

The workflow now is

  • Volker puts the PR into "needs work" state due to a 32-bit doctest failure.
  • Developer: now what!?
  • Volker: on 32-bit it's not 42.42, it's 42.41
  • Developer adjusts the doctest to print 42.41 on 32-bit.

So what?

@tobiasdiez
Copy link
Contributor Author

There are a few fundamental differences between those 32bit flags and normal doctest flags:

  • Normal flags appear on the line that is evaluated, while 32bit flags appear in the output
  • Normal flags (usually) only need to handle valid output, while 32bit flags may also appear in exceptions and warnings

Currently the 32bit flags are implemented by removing the non-matching line from the source, before they are even passed to the doctest parser (this is actually similar to Dima's proposed patch mechanism). This approach is not available for pytest. In principle, one should be able to move the comparison logic to the output checker, but I was not able to handle a few edge cases in #39207, most notable interactions with warnings/exceptions are tricky to handle correctly.

In addition, it is indeed very hard for devs on more recent systems to check these architecture dependent tests. For example, I don't have any means to test if my changes here are indeed working on 32bit systems... Neither are there any 32bit github actions to give more direct feedback.

IMHO the 32-bit differences are usually quite interesting and do tell you something about limitations of the code.

Can you give examples for this? It appears to me that the output on 64bit systems is correct, while on 32bit systems you get a wrong result; or sometimes (like with hashes) you just get a different result on 32bit systems but the difference is not really important.

@vbraun
Copy link
Member

vbraun commented Jul 30, 2025

If its the difference between 42.42 and 42.41, then its likely that a different compiler / os will sooner or later also get that on 64 bit. So IMHO catching it early that the tolerance needs to be increased is a plus. Also you should have used # rel tol in that case, not # 32-bit

@user202729
Copy link
Contributor

user202729 commented Jul 30, 2025

Currently the 32bit flags are implemented by removing the non-matching line from the source, before they are even passed to the doctest parser (this is actually similar to Dima's proposed patch mechanism). This approach is not available for pytest.

alright, so what approach are available for pytest? in particular, how are you currently implementing the parsing of sage: / ....: and preparsing? (preparsing definitely requires modification of the source?)

(are the pytest custom doctester logic currently in the code base?)

@tobiasdiez
Copy link
Contributor Author

If its the difference between 42.42 and 42.41, then its likely that a different compiler / os will sooner or later also get that on 64 bit. So IMHO catching it early that the tolerance needs to be increased is a plus. Also you should have used # rel tol in that case, not # 32-bit

A more representative example would be:

        sage: y^(2^30)
        Traceback (most recent call last):             # 32-bit
        OverflowError: exponent overflow (1073741824)  # 32-bit
        y^1073741824  # 64-bit

        sage: y^2^32
        OverflowError: Python int too large to convert to C unsigned long  # 32-bit
        OverflowError: exponent overflow (4294967296)  # 64-bit

What do you we learn from this, except that 32bit is not handling large exponents? And why is it important to check that the error message looks like it does on 32bit systems? What does this failure tells us about other systems? It seems very unlikely that any other 64 bit system will encounter the same limitation as a 32bit system and also throws an exponent overflow in this case.

alright, so what approach are available for pytest? in particular, how are you currently implementing the parsing of sage: / ....: and preparsing? (preparsing definitely requires modification of the source?)

(are the pytest custom doctester logic currently in the code base?)

It's reusing the same parser as the sage doctester:

super().__init__(parser=SageDocTestParser(set(["sage"])))

But the bitness handler is not in the parser, but in "sources.py".

@tornaria
Copy link
Member

Just a quick comment: the last example is not representative at all. A lot of mathematical software gives different (correct) answers depending on bitness, and we still want checks for those.

Not only 32/64 bit make differences in computations. Oftentimes, a different version of e.g. gap, pari, singular, etc, makes a different output, and we had to resort to ugly workarounds to handle more than one version of those libraries.

I think instead of removing the 32/64 bit output differences support, we should think about how to support other differences of this kind. This is IMO the only way to properly support a (reasonable) range of dependencies provided by the system (we cannot expect to have too strict version requirements, otherwise it means giving the back to distros)

At some point I wrote a proof of concept for this to support labels like "gap<4.12", etc. using the same approach as the 32/64 bitness labels (I never made a PR).

I completely agree with moving all doctesting to pytest (the more standard python tools we use, the better for maintainability), but NOT at the expense of losing important features of sage doctesting like this.

The fact that you don't have a 32 bit system to test, is not different from the fact that you may have gap 4.12 and not gap 4.11. Also, anybody with a x86_64 box also has potentially a 32 bit system (e.g. in a chroot or container).

@orlitzky
Copy link
Contributor

A lot of these are easy to fix so that they work on all systems:

  1. We never need to know the precise hash of an object. Instead of printing the hash of foo, half of these can be eliminated by changing the test to hash(foo) in ZZ.
  2. Where there are two distinct integer outputs, we can do foo in [32bit_value, 64bit_value]
  3. If we get some extra precision on 64-bit systems, stick a ... on the end of the 32-bit answer
  4. We already have abs tol if the answer is going to be essentially correct but differ slightly on 32-bit.
  5. If an exception will be raised on 32-bit, we can try to catch it and print the 64-bit answer instead so that there's only one expected output

If there are any truly difficult examples remaining after fixing those, we can replace the doctest with a (pytest) unit test that checks the 32-bit and 64-bit answers separately.

@orlitzky
Copy link
Contributor

I think instead of removing the 32/64 bit output differences support, we should think about how to support other differences of this kind. This is IMO the only way to properly support a (reasonable) range of dependencies provided by the system (we cannot expect to have too strict version requirements, otherwise it means giving the back to distros)

At some point I wrote a proof of concept for this to support labels like "gap<4.12", etc. using the same approach as the 32/64 bitness labels (I never made a PR).

I completely agree with moving all doctesting to pytest (the more standard python tools we use, the better for maintainability), but NOT at the expense of losing important features of sage doctesting like this.

The fact that you don't have a 32 bit system to test, is not different from the fact that you may have gap 4.12 and not gap 4.11. Also, anybody with a x86_64 box also has potentially a 32 bit system (e.g. in a chroot or container).

Eventually we should do these sorts of tests in pytest where it is easy, instead of in doctests where we are comparing strings and writing our own custom tooling to work around the fact that we can't just write if/then/else. This may hurt the documentation a tiny bit (if we are talking about EXAMPLES and not TESTS), but this is mitigated by the fact that examples with tons of mysterious # tags are confusing as hell anyway.

@user202729
Copy link
Contributor

we can't just write if/then/else.

You can. In fact, the following procedure "always" work:

  1. Write whatever you want in pytest
  2. copy that to doctest, appending sage: to the first line and ....: to the remaining lines.

The tags were invented specifically because they're shorter and more descriptive than copy paste the checking code every time. Imagine if instead of # rel tol 1e-13 every place you want it you need to expected = 123; assert abs(… - expected) / expected <= 1e-13 (you can refactor it into assert_equal(…, expected, rel_tol=1e-13), but then it's roughly exactly the same as the status quo, modulo the fact that now you have to type the extra assert_equal)

@user202729
Copy link
Contributor

It's reusing the same parser as the sage doctester

I guess you can just modify def parse(...) to delete the 32/64 bits tag accordingly then?

@orlitzky
Copy link
Contributor

we can't just write if/then/else.

You can. In fact, the following procedure "always" work:

1. Write whatever you want in pytest
2. copy that to doctest, appending `sage:` to the first line and `....:` to the remaining lines.

Ok, but for the doctest to test anything, you have to print (the same) string from each branch. Whereas in pytest you can easily assert(actual == expected) for two different values of expected. The doctest becomes ugly and confusing if you try to do it.

The tags were invented specifically because they're shorter and more descriptive than copy paste the checking code every time. Imagine if instead of # rel tol 1e-13 every place you want it you need to expected = 123; assert abs(… - expected) / expected <= 1e-13 (you can refactor it into assert_equal(…, expected, rel_tol=1e-13), but then it's roughly exactly the same as the status quo, modulo the fact that now you have to type the extra assert_equal)

The # long time tag affects the test framework and cannot easily be implemented any other way. # abs tol and # rel tol are regular expressions that affect every number in the output---they do something a lot more complicated than assert(abs(...)).

There are good reasons for some tags, but others (like 32-bit vs 64-bit) can be done easily in code. The main obstacle is that you have to print matching strings to appease the doctest framework. But this goes away if you convert the doctest to a unit test.

I don't think this,

sage: foo  # 32-bit
32-bit answer
sage: foo  # 64-bit
64-bit answer

is any more clear or efficient than

if 32_bit():
    assert(foo == "32-bit answer")
if 64_bit():
    assert(foo == "64-bit answer")

...especially if we are not expecting strings as the answer, say if one of them raises an error.

@user202729
Copy link
Contributor

Ok, but for the doctest to test anything, you have to print (the same) string from each branch. Whereas in pytest you can easily assert(actual == expected) for two different values of expected. The doctest becomes ugly and confusing if you try to do it.

Pytest:

def test_f():
    if condition1:
        assert f() == 1
    else:
        assert f() == 2

Doctest:

TESTS::

    sage: if condition1:
    ....:     assert f() == 1
    ....: else:
    ....:     assert f() == 2

The only difference is the sage: and ....:.


sage: foo  # 32-bit
32-bit answer
sage: foo  # 64-bit
64-bit answer

Not quite. The current code is

sage: foo
32-bit answer  # 32-bit
64-bit answer  # 64-bit

which avoids the duplication of foo.

But my discussion above was specifically when you say if/then/else.

@tobiasdiez
Copy link
Contributor Author

tobiasdiez commented Jul 31, 2025

A lot of these are easy to fix so that they work on all systems: ...

I like this, but lack the knowledge of the underlying mathematics in some cases to decide eg if the higher precision on 64bit should really be tested (or if a tolerance tag is appropriate). These steps can also easily done after this PR: once a test is changed to pass on 64+32 bit, one can simply remove the "known failure on 32bit systems" entry. Or do you think this should really be done as part of this PR?

Concerning the pytest implementation, @pytest.mark.skipif(is_32_bit) is nicer than those if/then/else constructions for pytest, see eg https://github.com/numpy/numpy/blob/a80542541130a2dc47eab2a6972019453cc0c518/numpy/random/tests/test_extending.py#L51-L54 for a "real life" example. They also easily generalize to other features like @tornaria mentions. In this case, one could have something like skipif(has_gap('<4')) to not run a test for older gap versions. Thus, I don't think we need a special doctest-solution for those feature/bitness tags but instead just use pytest a bit more.

(Sidenote: pytest upgrades the "assert" command to show better error messages in case of a failure. So all other things being equal, this would already favor the pytest-version)

@vbraun
Copy link
Member

vbraun commented Jul 31, 2025

        sage: y^(2^30)
        Traceback (most recent call last):             # 32-bit
        OverflowError: exponent overflow (1073741824)  # 32-bit
        y^1073741824  # 64-bit

What do you we learn from this, except that 32bit is not handling large exponents?

That is one of the cases that I would call super-interesting, and just from looking at a single doctest I can infer a ton of information:

  • exponents are machine integers (also on 64-bit)
  • I now roughly know the exponent limit on my 64-bit machine
  • overflows are actually checked and don't wrap (naive unsigned machine int) or are undefined behavior (naive signed machine int)

@orlitzky
Copy link
Contributor

A lot of these are easy to fix so that they work on all systems: ...

I like this, but lack the knowledge of the underlying mathematics in some cases to decide eg if the higher precision on 64bit should really be tested (or if a tolerance tag is appropriate). These steps can also easily done after this PR: once a test is changed to pass on 64+32 bit, one can simply remove the "known failure on 32bit systems" entry. Or do you think this should really be done as part of this PR?

I just think that, since many of these can be improved to avoid the issue entirely (particularly the hash() examples), it's an easy way to address the objection that useful doctests are being deleted.

If there are only a few left that need special handling, it becomes a lot easier to think about, or justify by punting it to another ticket with clear scope (e.g. "do something about these 5 tests").

@tobiasdiez
Copy link
Contributor Author

Superseded by #41540

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants