Description
Describe the issue
Something is causing our pytest
runs to crash with unhelpful errors i.e. The operation was canceled.
and The hosted runner: GitHub Actions 12 lost communication with the server. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
As suggested by the second error, this is likely an out of memory issue as adding 5GB of swap space memory in #2413 resolve these crashes.
This only appears under certain circumstances though:
- Periodic tests (https://github.com/aeon-toolkit/aeon/actions/runs/12071259094/job/33662613284) - PASS
- Regular PR testing (https://github.com/aeon-toolkit/aeon/actions/runs/12053699859/job/33610809333) - PASS
- Testing with all tests and
numba
cache (https://github.com/aeon-toolkit/aeon/actions/runs/12072412056/job/33666335579) - PASS - Testing with all tests and cache disabled (same as release testing) (https://github.com/aeon-toolkit/aeon/actions/runs/12066354265) - FAIL
There seems to be some strange interaction with the numba
cache action, despite loading nothing in the periodic test running it causes tests to pass, while not running it at all causes a failure.
Just increasing the swap space is not a great solution to this, but works temporarily.
Suggest a potential alternative/fix
Investigate the differences in testing setups, and figure out what is causing the OOM error if that is the case.
Additional context
I have only seen this on Linux runners, not macOS or Windows. ubuntu-20.04
seems to produce this less often. Related to #1162 possibly.