Skip to content

test_parallel_access started to fail #124

@yarikoptic

Description

@yarikoptic

https://github.com/datalad/datalad-fuse/actions/runs/18932892875/job/54053107943

datalad_fuse/tests/test_fsspec_head.py ................................. [ 37%]
...                                                                      [ 41%]
datalad_fuse/tests/test_fuse.py ............................F            [ 74%]
datalad_fuse/tests/test_util.py ......................                   [100%]

=================================== FAILURES ===================================
_____________________________ test_parallel_access _____________________________

tmp_path = PosixPath('/tmp/pytest-of-runner/pytest-0/test_parallel_access0')
big_url_dataset = (Dataset('/tmp/pytest-of-runner/pytest-0/big_url_dataset0/ds'), {'APL.pdf': 'c65ccc2a97cdb6042641847112dc6e4d4d6e75fde...ff9197', 'libpython3.10-stdlib_3.10.4-3_i386.deb': 'e79c1416ec792b61ad9770f855bf6889e57be5f6511ea814d81ef5f9b1a3eec9'})

    def test_parallel_access(tmp_path, big_url_dataset):
        ds, data_files = big_url_dataset
        with fusing(ds.path, tmp_path) as mount:
            with ThreadPoolExecutor() as pool:
                futures = {
                    pool.submit(sha256_file, mount / path): dgst
                    for path, dgst in data_files.items()
                }
                for fut in as_completed(futures.keys()):
>                   assert fut.result() == futures[fut]
E                   AssertionError: assert '649df10c228a...91e625af094be' == 'c65ccc2a97cd...1f855711bbaf4'
E                     
E                     - c65ccc2a97cdb6042641847112dc6e4d4d6e75fdedcd476fdd61f855711bbaf4
E                     + 649df10c228aa245614ebf8a76a64ebd09c91f90e8435c68cb591e625af094be

datalad_fuse/tests/test_fuse.py:253: AssertionError

and apparently we did not have con/tinuous setup yet, now there is ... and apparently it might be not flake but some change in git-annex and/or datalad which triggered this:

(tinuous-dev) datalad@smaug:/mnt/datasets/datalad/ci/datalad-fuse/2025$ datalad foreach-dataset -s --o-s relpath -r git grep FAILED
10/24/cron/20251024T071607/67e7b21/github-FUSE Tests-737-failed/0_test.txt:2025-10-24T07:22:02.2907606Z FAILED datalad_fuse/tests/test_fuse.py::test_parallel_access - AssertionError...                                     
10/25/cron/20251025T071431/67e7b21/github-FUSE Tests-738-failed/0_test.txt:2025-10-25T07:20:45.8950335Z FAILED datalad_fuse/tests/test_fuse.py::test_parallel_access - AssertionError...
10/26/cron/20251026T071407/67e7b21/github-FUSE Tests-739-failed/0_test.txt:2025-10-26T07:20:16.6977172Z FAILED datalad_fuse/tests/test_fuse.py::test_parallel_access - AssertionError...
10/27/cron/20251027T071713/67e7b21/github-FUSE Tests-740-failed/0_test.txt:2025-10-27T07:31:28.1020333Z FAILED datalad_fuse/tests/test_fuse.py::test_parallel_access - AssertionError...
10/28/cron/20251028T071634/67e7b21/github-FUSE Tests-741-failed/0_test.txt:2025-10-28T07:24:39.1137189Z FAILED datalad_fuse/tests/test_fuse.py::test_parallel_access - AssertionError...
10/29/cron/20251029T071702/67e7b21/github-FUSE Tests-742-failed/0_test.txt:2025-10-29T07:23:55.4836595Z FAILED datalad_fuse/tests/test_fuse.py::test_parallel_access - AssertionError...
10/30/cron/20251030T071602/67e7b21/github-FUSE Tests-743-failed/0_test.txt:2025-10-30T07:25:52.3065131Z FAILED datalad_fuse/tests/test_fuse.py::test_parallel_access - AssertionError...
FWIW -- here is analysis by claude which insisted that there were no changes in env

● Analysis Complete: No Prior Failures Found

The test was NOT failing before the last successful pass. Here's the complete pattern:

Historical Test Results

September 2025 (entire month):

  • All runs: 87 passed, 0 failed
  • test_parallel_access: PASSING consistently
  • Zero failures of any kind

October 2025 (1st - 23rd):

  • All runs: 87 passed, 0 failed
  • Sample dates checked:
    • 10/01: 87 passed ✓
    • 10/10: 87 passed ✓
    • 10/20: 87 passed ✓
    • 10/22: 87 passed ✓
    • 10/23: 87 passed ✓

October 2025 (24th - 30th):

  • All runs: 86 passed, 1 failed
  • Failing test: test_parallel_access ONLY
  • Dates with failures: 10/24, 10/25, 10/26, 10/27, 10/28, 10/29, 10/30

Updated Regression Assessment

This is a clean, sudden regression that started precisely on 2025-10-24:

  1. Not a flaky test - The test passed consistently for at least 2+ months (September + early October)
  2. No intermittent failures - Zero failures before 10/24, 100% failure rate after
  3. Binary transition - Went from 87/87 passing to 86/87 on a specific date
  4. Persistent failure - Has failed every single day for 7 consecutive days

Implications

This pattern strongly suggests an external change on 2025-10-24:

Most likely causes (in order of probability):

  1. GitHub Actions infrastructure change - Azure VM image update, kernel patch, or scheduler change not reflected in version numbers
  2. Network/CDN behavior change - The test downloads files via HTTP; CDN caching or routing may have changed
  3. Timing threshold crossed - Infrastructure became slightly slower, exposing a latent race condition that now triggers consistently
  4. System library update - FUSE kernel module, filesystem drivers, or threading libraries updated

Least likely:

  • Code regression (commit 67e7b21 unchanged)
  • Dependency update (all package versions identical)
  • Test flakiness (100% failure rate contradicts this)

Recommended Next Steps

Given this is clearly an infrastructure-triggered issue:

  1. Contact GitHub Support - Ask if Azure runner infrastructure changed on 2025-10-24
  2. Add defensive code - Implement proper locking/synchronization in FUSE parallel access regardless of root cause
  3. Bisect time, not code - The trigger is temporal, not code-based
  4. Check GitHub Actions status - Look at https://www.githubstatus.com/history for incidents around Oct 24
  5. Consider workaround - Pin to specific runner image version or add retry logic

The skill worked perfectly to identify this! The clean regression pattern makes investigation much easier.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions