Fix race condition of un-expired cache in local workers #16256

AlanCoding · 2026-01-29T14:23:44Z

SUMMARY

Still seeing in live PRs:

____________________________ test_git_file_project _____________________________

live_tmp_folder = '/tmp/live_tests'
run_job_from_playbook = <function run_job_from_playbook.<locals>._rf at 0x7f5637c34860>

    def test_git_file_project(live_tmp_folder, run_job_from_playbook):
>       run_job_from_playbook('test_git_file_project', 'debug.yml', scm_url=f'file://{live_tmp_folder}/debug')

tests/projects/test_file_projects.py:10: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/conftest.py:205: in _rf
    wait_for_job(proj.current_job)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

job = <ProjectUpdate: 2026-01-29 13:58:05.476070+00:00-4-failed>
final_status = 'successful', running_timeout = 800

    def wait_for_job(job, final_status='successful', running_timeout=800):
        wait_to_leave_status(job, 'pending')
        wait_to_leave_status(job, 'waiting')
        wait_to_leave_status(job, 'running', timeout=running_timeout)
    
>       assert job.status == final_status, f'Job was not successful id={job.id} status={job.status} tb={job.result_traceback} output=\n{unified_job_stdout(job)}'
E       AssertionError: Job was not successful id=4 status=failed tb= output=
E         
E         
E         PLAY [Update source tree if necessary] *****************************************
E         
E         TASK [Update project using git] ************************************************
E         
E         fatal: [localhost]: FAILED! => {"changed": false, "cmd": "/usr/bin/git ls-remote file:///tmp/live_tests/debug -h refs/heads/HEAD", "msg": "fatal: '/tmp/live_tests/debug' does not appear to be a git repository\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.", "rc": 128, "stderr": "fatal: '/tmp/live_tests/debug' does not appear to be a git repository\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\n", "stderr_lines": ["fatal: '/tmp/live_tests/debug' does not appear to be a git repository", "fatal: Could not read from remote repository.", "", "Please make sure you have the correct access rights", "and the repository exists."], "stdout": "", "stdout_lines": []}
E         
E         PLAY RECAP *********************************************************************
E         localhost                  : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
E       assert 'failed' == 'successful'
E         
E         - successful
E         + failed

tests/conftest.py:112: AssertionError

I'm fairly sure that I know what's happening in the case of the github runner.

This clears the setting value in both forms of cache:

in redis
in the local memory

The problem, however, is that the project update runs in a worker, of where there are 4 by default. The in-memory cache expires in 5 seconds. However, I expect it to take less than 5 seconds for the first project update to start in most cases. That means that in some cases it will be picking up the stale value of AWX_ISOLATION_SHOW_PATHS.

That would predict a 3 out of 4 chance of hitting the failure. But we have been seeing it much less frequently than that. So what gives? Well, out of the 4 workers, its possible (likely actually) that many of them won't have a pre-existing value. That is saying, some workers have ran a job in the last 5 seconds, but others have not. This starts to match very well the observed frequency and arrives at a true theory for reproduction. If you saturate the workers with unrelated jobs first, then you populate the in-memory cache, and then you could introduce a 3 of 4 or 4 of 4 chance of hitting the failure due to in-memory cache. Specifically, that in-memory cache is:

awx/awx/conf/settings.py

Line 36 in 99dce79

SETTING_MEMORY_TTL = 5

and

awx/awx/conf/settings.py

Line 260 in 99dce79

    
           self.__dict__['_awx_conf_memoizedcache'] = cachetools.TTLCache(maxsize=2048, ttl=SETTING_MEMORY_TTL)

So I hope I've convinced you of the mechanic now. Accepting this theory, the solution is obvious. The sleep must equal the in-memory cache expiration time (absent some better way of telling workers to clear their cache).

This looks very bad for test performance. This fixture is used in many tests. However, the if we are under should only be hit upon the first application of the fixture - that is, once per test run. So I'll accept the 5 additional seconds to eliminate this failure we are still seeing.

ISSUE TYPE

Bug, Docs Fix or other nominal change

COMPONENT NAME

API

Note

Low Risk
Low risk: change is limited to live test setup timing; primary impact is slightly longer test runtime and potential for added flakiness if 5s is still insufficient on slow CI.

Overview
Reduces live-test flakiness caused by stale per-worker in-memory settings cache for AWX_ISOLATION_SHOW_PATHS by increasing the post-clear_setting_cache delay from 0.2s to 5.0s in the live_tmp_folder session fixture, ensuring worker-local TTL caches have time to expire before project updates start.

^{Written by Cursor Bugbot for commit 9f365bc. This will update automatically on new commits. Configure here.}

sonarqubecloud · 2026-01-29T14:40:42Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Fix race condition of un-expired cache in local workers

9f365bc

github-actions bot added the component:api label Jan 29, 2026

pb82 approved these changes Jan 29, 2026

View reviewed changes

thedoubl3j approved these changes Jan 29, 2026

View reviewed changes

AlanCoding merged commit 3d68ca8 into ansible:devel Jan 29, 2026
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race condition of un-expired cache in local workers #16256

Fix race condition of un-expired cache in local workers #16256

Uh oh!

AlanCoding commented Jan 29, 2026 •

edited by cursor bot

Loading

Uh oh!

sonarqubecloud bot commented Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix race condition of un-expired cache in local workers #16256

Fix race condition of un-expired cache in local workers #16256

Uh oh!

Conversation

AlanCoding commented Jan 29, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SUMMARY

ISSUE TYPE

COMPONENT NAME

Uh oh!

sonarqubecloud bot commented Jan 29, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AlanCoding commented Jan 29, 2026 •

edited by cursor bot

Loading