Skip to content

Conversation

@saintstack
Copy link
Contributor

when joshua-agent jobs are running (up to MAX_JOBS total). rhel9 jobs were being shut out if joshua-agent jobs were running until queued ensemble count was < jobs running count: now jobs will be scheduled as long as we stay under MAX_JOBS count.

Made the agent-scaler.sh script generic so works for main and rockylinux9 branch (It makes use of AGENT_NAME env variable).

To debug, I logged into the joshua agent-scaler for rhel9 and for centos pod and then ran agent-scaler.sh with bash -x command. I saw that if a queued ensemble, it would not run jobs... not until the count of jobs fell below the count of queued ensembles (rare event). I then ran the patched agent-scaler.sh and it runs jobs more freely with a mix of rhel9 and centos jobs.

when joshua-agent jobs are running (up to MAX_JOBS total).
rhel9 jobs were being shut out if joshua-agent jobs were
running until queued ensemble count was < jobs running
count: now jobs will be scheduled as long as we stay
under MAX_JOBS count.

Fix cleanup so we do rhel9 job cleanup only in this script --
make general use of the AGENT_NAME varible.
Copy link
Contributor

@jzhou77 jzhou77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@saintstack saintstack merged commit af1d525 into FoundationDB:main May 30, 2025
2 checks passed
saintstack pushed a commit to saintstack/fdb-joshua that referenced this pull request Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants