Skip to content

Conversation

@saintstack
Copy link
Contributor

@saintstack saintstack commented May 30, 2025

… are running (up to MAX_JOBS total). rhel9 jobs were being shut out if joshua-agent jobs were running until queued ensemble count was < jobs running count: now jobs will be scheduled as long as we stay under MAX_JOBS count.

Fix cleanup so we do rhel9 job cleanup only in this script -- make general use of the AGENT_NAME varible.

Patch is against the rockylinux9 branch but this script has been tested on joshua-agent (centos) and joshua-rhel9-agent.

To debug, I logged into the joshua agent-scaler for rhel9 pod and then ran agent-scheduler.sh with bash -x command. I saw that if a queued ensemble, it would not run jobs... not until the count of jobs fell below the count of queued ensembles (rare event). I then ran the patched agent-scheduler.sh and it runs jobs more freely with a mix of rhel9 and centos jobs.

michael stack added 3 commits May 30, 2025 12:55
… are running (up to MAX_JOBS total). rhel9 jobs were being shut out if joshua-agent jobs were running until queued ensemble count was < jobs running count: now jobs will be scheduled as long as we stay under MAX_JOBS count.

Fix cleanup so we do rhel9 job cleanup only in this script --
make general use of the AGENT_NAME varible.
@saintstack
Copy link
Contributor Author

Close this because going for main branch PR instead #112

@saintstack saintstack closed this May 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant