Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions k8s/agent-scaler/agent-scaler.sh
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,19 @@ while true; do
# The number of fields to cut for the job prefix depends on the number of hyphens in AGENT_NAME itself, plus one for the timestamp part.
num_hyphen_fields_in_agent_name=$(echo "${AGENT_NAME}" | awk -F'-' '{print NF}')
job_prefix_fields=$((num_hyphen_fields_in_agent_name + 1))
for job_prefix in $(kubectl get pods -n "${namespace}" --no-headers | { grep -E "^${AGENT_NAME}-[0-9]+(-[0-9]+)?-" || true; } | { grep -E -e "Completed" -e "Error" || true; } | cut -f 1-${job_prefix_fields} -d '-'); do
if [ -n "$job_prefix" ]; then
echo "=== Deleting Job based on pod status: $job_prefix === (AGENT_NAME: ${AGENT_NAME})"
kubectl delete job "$job_prefix" -n "${namespace}" --ignore-not-found=true
for job_prefix_from_pod in $(kubectl get pods -n "${namespace}" --no-headers | { grep -E "^${AGENT_NAME}-[0-9]+(-[0-9]+)?-" || true; } | { grep -E -e "Completed" -e "Error" || true; } | cut -f 1-${job_prefix_fields} -d '-'); do
if [ -n "$job_prefix_from_pod" ]; then
# Validate that the derived job_prefix_from_pod actually matches the expected format for this agent's jobs
if [[ "${job_prefix_from_pod}" =~ ^${AGENT_NAME}-[0-9]+(-[0-9]+)?$ ]]; then
echo "=== Deleting Job based on pod status: $job_prefix_from_pod === (AGENT_NAME: ${AGENT_NAME})"
kubectl delete job "$job_prefix_from_pod" -n "${namespace}" --ignore-not-found=true
else
# This case can occur if AGENT_NAME is unusual (e.g., 'foo-bar' and a pod 'foo-bar-baz-TIMESTAMP-...' exists)
Copy link

@kakaiu kakaiu Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why will we have an unusual AGENT_NAME? Is this common?

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not happen. We have only Joshua-agent and Joshua-rhel9-agent for AGENT_NAME....

# or if the pod naming doesn't strictly follow AGENT_NAME-TIMESTAMP-SUFFIX.
# The initial grep on pods already ensures it starts with AGENT_NAME-TIMESTAMP_LIKE_PATTERN-,
# so this condition means the 'cut' command resulted in a prefix not matching AGENT_NAME-TIMESTAMP.
echo "=== WARNING: Pod for AGENT_NAME ${AGENT_NAME} yielded job prefix candidate '${job_prefix_from_pod}' that does not match expected pattern '^${AGENT_NAME}-[0-9]+(-[0-9]+)?$'. Skipping delete. ==="
fi
fi
done
fi
Expand Down