-
Notifications
You must be signed in to change notification settings - Fork 4
Description
https://github.com/datalad/datalad-fuse/actions/runs/18932892875/job/54053107943
datalad_fuse/tests/test_fsspec_head.py ................................. [ 37%]
... [ 41%]
datalad_fuse/tests/test_fuse.py ............................F [ 74%]
datalad_fuse/tests/test_util.py ...................... [100%]
=================================== FAILURES ===================================
_____________________________ test_parallel_access _____________________________
tmp_path = PosixPath('/tmp/pytest-of-runner/pytest-0/test_parallel_access0')
big_url_dataset = (Dataset('/tmp/pytest-of-runner/pytest-0/big_url_dataset0/ds'), {'APL.pdf': 'c65ccc2a97cdb6042641847112dc6e4d4d6e75fde...ff9197', 'libpython3.10-stdlib_3.10.4-3_i386.deb': 'e79c1416ec792b61ad9770f855bf6889e57be5f6511ea814d81ef5f9b1a3eec9'})
def test_parallel_access(tmp_path, big_url_dataset):
ds, data_files = big_url_dataset
with fusing(ds.path, tmp_path) as mount:
with ThreadPoolExecutor() as pool:
futures = {
pool.submit(sha256_file, mount / path): dgst
for path, dgst in data_files.items()
}
for fut in as_completed(futures.keys()):
> assert fut.result() == futures[fut]
E AssertionError: assert '649df10c228a...91e625af094be' == 'c65ccc2a97cd...1f855711bbaf4'
E
E - c65ccc2a97cdb6042641847112dc6e4d4d6e75fdedcd476fdd61f855711bbaf4
E + 649df10c228aa245614ebf8a76a64ebd09c91f90e8435c68cb591e625af094be
datalad_fuse/tests/test_fuse.py:253: AssertionError
and apparently we did not have con/tinuous setup yet, now there is ... and apparently it might be not flake but some change in git-annex and/or datalad which triggered this:
(tinuous-dev) datalad@smaug:/mnt/datasets/datalad/ci/datalad-fuse/2025$ datalad foreach-dataset -s --o-s relpath -r git grep FAILED
10/24/cron/20251024T071607/67e7b21/github-FUSE Tests-737-failed/0_test.txt:2025-10-24T07:22:02.2907606Z FAILED datalad_fuse/tests/test_fuse.py::test_parallel_access - AssertionError...
10/25/cron/20251025T071431/67e7b21/github-FUSE Tests-738-failed/0_test.txt:2025-10-25T07:20:45.8950335Z FAILED datalad_fuse/tests/test_fuse.py::test_parallel_access - AssertionError...
10/26/cron/20251026T071407/67e7b21/github-FUSE Tests-739-failed/0_test.txt:2025-10-26T07:20:16.6977172Z FAILED datalad_fuse/tests/test_fuse.py::test_parallel_access - AssertionError...
10/27/cron/20251027T071713/67e7b21/github-FUSE Tests-740-failed/0_test.txt:2025-10-27T07:31:28.1020333Z FAILED datalad_fuse/tests/test_fuse.py::test_parallel_access - AssertionError...
10/28/cron/20251028T071634/67e7b21/github-FUSE Tests-741-failed/0_test.txt:2025-10-28T07:24:39.1137189Z FAILED datalad_fuse/tests/test_fuse.py::test_parallel_access - AssertionError...
10/29/cron/20251029T071702/67e7b21/github-FUSE Tests-742-failed/0_test.txt:2025-10-29T07:23:55.4836595Z FAILED datalad_fuse/tests/test_fuse.py::test_parallel_access - AssertionError...
10/30/cron/20251030T071602/67e7b21/github-FUSE Tests-743-failed/0_test.txt:2025-10-30T07:25:52.3065131Z FAILED datalad_fuse/tests/test_fuse.py::test_parallel_access - AssertionError...
FWIW -- here is analysis by claude which insisted that there were no changes in env
● Analysis Complete: No Prior Failures Found
The test was NOT failing before the last successful pass. Here's the complete pattern:
Historical Test Results
September 2025 (entire month):
- All runs: 87 passed, 0 failed
- test_parallel_access: PASSING consistently
- Zero failures of any kind
October 2025 (1st - 23rd):
- All runs: 87 passed, 0 failed
- Sample dates checked:
- 10/01: 87 passed ✓
- 10/10: 87 passed ✓
- 10/20: 87 passed ✓
- 10/22: 87 passed ✓
- 10/23: 87 passed ✓
October 2025 (24th - 30th):
- All runs: 86 passed, 1 failed
- Failing test: test_parallel_access ONLY
- Dates with failures: 10/24, 10/25, 10/26, 10/27, 10/28, 10/29, 10/30
Updated Regression Assessment
This is a clean, sudden regression that started precisely on 2025-10-24:
- Not a flaky test - The test passed consistently for at least 2+ months (September + early October)
- No intermittent failures - Zero failures before 10/24, 100% failure rate after
- Binary transition - Went from 87/87 passing to 86/87 on a specific date
- Persistent failure - Has failed every single day for 7 consecutive days
Implications
This pattern strongly suggests an external change on 2025-10-24:
Most likely causes (in order of probability):
- GitHub Actions infrastructure change - Azure VM image update, kernel patch, or scheduler change not reflected in version numbers
- Network/CDN behavior change - The test downloads files via HTTP; CDN caching or routing may have changed
- Timing threshold crossed - Infrastructure became slightly slower, exposing a latent race condition that now triggers consistently
- System library update - FUSE kernel module, filesystem drivers, or threading libraries updated
Least likely:
- Code regression (commit 67e7b21 unchanged)
- Dependency update (all package versions identical)
- Test flakiness (100% failure rate contradicts this)
Recommended Next Steps
Given this is clearly an infrastructure-triggered issue:
- Contact GitHub Support - Ask if Azure runner infrastructure changed on 2025-10-24
- Add defensive code - Implement proper locking/synchronization in FUSE parallel access regardless of root cause
- Bisect time, not code - The trigger is temporal, not code-based
- Check GitHub Actions status - Look at https://www.githubstatus.com/history for incidents around Oct 24
- Consider workaround - Pin to specific runner image version or add retry logic
The skill worked perfectly to identify this! The clean regression pattern makes investigation much easier.