-
Notifications
You must be signed in to change notification settings - Fork 56
Closed
Labels
area/generalNot tied to a specific areaNot tied to a specific areacomplexity/single-taskRegular task; should be done within daysRegular task; should be done within daysgain/highBrings a lot of value to usersBrings a lot of value to usersimpact/highAffects a lot of usersAffects a lot of userskind/bugAn unexpected problem or behaviorAn unexpected problem or behavior
Description
After the previous week redeployment (6th January), we have started hitting issues with jobs processing, causing tasks not being processed for some time and delays:
- we could see restarts of the worker pods caused by hitting CPU limits - tried to mitigate this by increasing the CPU limits (Increase cpu limit to handle spikes better deployment#631), the limit was also increased for postgres (Increase cpu limits for postgres deployment#636) where metrics showed also going above limit
- sometimes, the tasks are not being processed at all in workers, without any task blocking them
- we could see in logs messages like:
Substantial drift from celery@packit-worker-long-running-0 may mean clocks are out of sync. Current drift is 1799 seconds. [orig: 2025-01-14 14:51:59.656603 recv: 2025-01-14 14:22:00.484181]consumer: Connection to broker lost. Trying to re-establish the connection...followed by a restart
Metadata
Metadata
Assignees
Labels
area/generalNot tied to a specific areaNot tied to a specific areacomplexity/single-taskRegular task; should be done within daysRegular task; should be done within daysgain/highBrings a lot of value to usersBrings a lot of value to usersimpact/highAffects a lot of usersAffects a lot of userskind/bugAn unexpected problem or behaviorAn unexpected problem or behavior
Type
Projects
Status
done