Description
I noticed that sometimes our booktests timeout after 2.5 hours / 150 minutes, e.g. here for PR #4490 while usually they finish with lots of room to spare; e.g. here in 40 minutes.
In this case, the last message from the test runner in the logs before a gazillion messages of the form From worker 7: GC: pause 15.33ms. collected 52.249626MB. incr
was this:
2025-01-22T23:08:56.8299319Z From worker 7: vinberg_2.jlcon
Not having looked at when this filename is printed, I don't know if this indicates we are running the tests in vinberg_2.jlcon
, or perhaps in the file after it.
But anyway, my suspicion is that there are some tests here which "usually" work but for some RNG seed states end up performing much worse than usual.
Perhaps we should reduce fluctuation in this test by forcing a specific seed? Or do we already set a seed -- then this might a sign of another source of randomness we do not yet control for (e.g. when Singular is forking to achieve parallelism, I am guessing the order in which child processes finish their task may affect the overall result)
Also it may be useful if someone dug into it to figure out which part causes the issue. Since we parse the .jlcon
files to process them step-by-step, perhaps we could enable a debug mode where it prints out which line (number) it is currently processing?