Summary
When multiple control-plane instances have equal remaining capacity (the common case for container-group workloads where AWX_CONTROL_NODE_TASK_IMPACT = 1), fit_task_to_most_remaining_capacity_instance always selects the same node due to deterministic iteration order and strict > tie-breaking.
This causes burst workloads (many jobs submitted concurrently) to concentrate all job management overhead (event processing, callbacks, output streaming) on a single controller pod, while other controller pods remain idle.
Steps to reproduce
- Deploy AWX with 3 controller task replicas (e.g.
task_replicas: 3 in the CR)
- Use a container group instance group for job execution (the default setup)
- Submit 20+ jobs concurrently via the API:
for i in $(seq 1 20); do
curl -sk -u admin:password -X POST "$URL/api/v2/job_templates/7/launch/" \
-H "Content-Type: application/json" &
done
wait
- Check
controller_node on the completed jobs:
curl -sk -u admin:password "$URL/api/v2/jobs/?id__gte=<first_job_id>&order_by=id" | \
python3 -c "import sys,json; [print(j['controller_node']) for j in json.load(sys.stdin)['results']]"
Expected result
Jobs should be distributed across available controller nodes when all have equal (or near-equal) remaining capacity.
Actual result
100% of jobs are assigned to a single controller node. Other controller pods manage zero jobs during the burst.
Root cause
In awx/main/scheduler/task_manager_models.py, the selection logic:
if would_be_remaining >= 0 and (instance_most_capacity is None or would_be_remaining > most_remaining_capacity):
For container-group jobs, the control impact is only 1 unit (AWX_CONTROL_NODE_TASK_IMPACT = 1) out of typical capacity of ~640. Combined with:
- Sequential task manager cycles (each processing approximately 1 job due to advisory lock timing)
- Capacity being reset between cycles (completed jobs free their impact)
All nodes appear equally viable on every cycle, and the first node in iteration order always wins the tie.
Impact
The controller node handles job lifecycle management: event processing, callback receiver, output streaming to the database and websocket consumers. Concentrating 40+ concurrent jobs on one pod creates resource pressure on that pod while others remain idle, potentially causing job failures under memory/CPU constraints.
Environment
- AWX 24.x / AAP 2.5+ (any version with multi-replica controller support)
- Container group execution (Kubernetes/EKS/OpenShift)
- Multiple controller task replicas
Summary
When multiple control-plane instances have equal remaining capacity (the common case for container-group workloads where
AWX_CONTROL_NODE_TASK_IMPACT = 1),fit_task_to_most_remaining_capacity_instancealways selects the same node due to deterministic iteration order and strict>tie-breaking.This causes burst workloads (many jobs submitted concurrently) to concentrate all job management overhead (event processing, callbacks, output streaming) on a single controller pod, while other controller pods remain idle.
Steps to reproduce
task_replicas: 3in the CR)controller_nodeon the completed jobs:Expected result
Jobs should be distributed across available controller nodes when all have equal (or near-equal) remaining capacity.
Actual result
100% of jobs are assigned to a single controller node. Other controller pods manage zero jobs during the burst.
Root cause
In
awx/main/scheduler/task_manager_models.py, the selection logic:For container-group jobs, the control impact is only 1 unit (
AWX_CONTROL_NODE_TASK_IMPACT = 1) out of typical capacity of ~640. Combined with:All nodes appear equally viable on every cycle, and the first node in iteration order always wins the tie.
Impact
The controller node handles job lifecycle management: event processing, callback receiver, output streaming to the database and websocket consumers. Concentrating 40+ concurrent jobs on one pod creates resource pressure on that pod while others remain idle, potentially causing job failures under memory/CPU constraints.
Environment