Skip to content

Conversation

@AlanCoding
Copy link
Member

@AlanCoding AlanCoding commented Jan 29, 2026

SUMMARY

Still seeing in live PRs:

#16255

____________________________ test_git_file_project _____________________________

live_tmp_folder = '/tmp/live_tests'
run_job_from_playbook = <function run_job_from_playbook.<locals>._rf at 0x7f5637c34860>

    def test_git_file_project(live_tmp_folder, run_job_from_playbook):
>       run_job_from_playbook('test_git_file_project', 'debug.yml', scm_url=f'file://{live_tmp_folder}/debug')

tests/projects/test_file_projects.py:10: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests/conftest.py:205: in _rf
    wait_for_job(proj.current_job)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

job = <ProjectUpdate: 2026-01-29 13:58:05.476070+00:00-4-failed>
final_status = 'successful', running_timeout = 800

    def wait_for_job(job, final_status='successful', running_timeout=800):
        wait_to_leave_status(job, 'pending')
        wait_to_leave_status(job, 'waiting')
        wait_to_leave_status(job, 'running', timeout=running_timeout)
    
>       assert job.status == final_status, f'Job was not successful id={job.id} status={job.status} tb={job.result_traceback} output=\n{unified_job_stdout(job)}'
E       AssertionError: Job was not successful id=4 status=failed tb= output=
E         
E         
E         PLAY [Update source tree if necessary] *****************************************
E         
E         TASK [Update project using git] ************************************************
E         
E         fatal: [localhost]: FAILED! => {"changed": false, "cmd": "/usr/bin/git ls-remote file:///tmp/live_tests/debug -h refs/heads/HEAD", "msg": "fatal: '/tmp/live_tests/debug' does not appear to be a git repository\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.", "rc": 128, "stderr": "fatal: '/tmp/live_tests/debug' does not appear to be a git repository\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\n", "stderr_lines": ["fatal: '/tmp/live_tests/debug' does not appear to be a git repository", "fatal: Could not read from remote repository.", "", "Please make sure you have the correct access rights", "and the repository exists."], "stdout": "", "stdout_lines": []}
E         
E         PLAY RECAP *********************************************************************
E         localhost                  : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   
E       assert 'failed' == 'successful'
E         
E         - successful
E         + failed

tests/conftest.py:112: AssertionError

I'm fairly sure that I know what's happening in the case of the github runner.

This clears the setting value in both forms of cache:

  • in redis
  • in the local memory

The problem, however, is that the project update runs in a worker, of where there are 4 by default. The in-memory cache expires in 5 seconds. However, I expect it to take less than 5 seconds for the first project update to start in most cases. That means that in some cases it will be picking up the stale value of AWX_ISOLATION_SHOW_PATHS.

That would predict a 3 out of 4 chance of hitting the failure. But we have been seeing it much less frequently than that. So what gives? Well, out of the 4 workers, its possible (likely actually) that many of them won't have a pre-existing value. That is saying, some workers have ran a job in the last 5 seconds, but others have not. This starts to match very well the observed frequency and arrives at a true theory for reproduction. If you saturate the workers with unrelated jobs first, then you populate the in-memory cache, and then you could introduce a 3 of 4 or 4 of 4 chance of hitting the failure due to in-memory cache. Specifically, that in-memory cache is:

SETTING_MEMORY_TTL = 5

and

self.__dict__['_awx_conf_memoizedcache'] = cachetools.TTLCache(maxsize=2048, ttl=SETTING_MEMORY_TTL)

So I hope I've convinced you of the mechanic now. Accepting this theory, the solution is obvious. The sleep must equal the in-memory cache expiration time (absent some better way of telling workers to clear their cache).

This looks very bad for test performance. This fixture is used in many tests. However, the if we are under should only be hit upon the first application of the fixture - that is, once per test run. So I'll accept the 5 additional seconds to eliminate this failure we are still seeing.

ISSUE TYPE
  • Bug, Docs Fix or other nominal change
COMPONENT NAME
  • API

Note

Low Risk
Low risk: change is limited to live test setup timing; primary impact is slightly longer test runtime and potential for added flakiness if 5s is still insufficient on slow CI.

Overview
Reduces live-test flakiness caused by stale per-worker in-memory settings cache for AWX_ISOLATION_SHOW_PATHS by increasing the post-clear_setting_cache delay from 0.2s to 5.0s in the live_tmp_folder session fixture, ensuring worker-local TTL caches have time to expire before project updates start.

Written by Cursor Bugbot for commit 9f365bc. This will update automatically on new commits. Configure here.

@sonarqubecloud
Copy link

@AlanCoding AlanCoding merged commit 3d68ca8 into ansible:devel Jan 29, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants