Problem
There are 30+ orphaned check_items processes on doab-check.ebookfoundation.org, some dating back to 2025. The hourly cron launches a new check_items every hour but old ones never exit.
ubuntu 37287 0.0 2.9 88576 59896 ? S 2025 0:07 ...check_items
ubuntu 138774 0.0 1.9 64720 38828 ? S 2025 0:05 ...check_items
ubuntu 145906 0.0 3.2 92088 65572 ? S 2025 0:05 ...check_items
...
ubuntu 2858393 0.0 1.6 62032 33252 ? S Jan20 0:10 ...check_items
ubuntu 2859463 0.0 1.9 68288 39204 ? S Jan20 0:11 ...check_items
Each process uses 30-65MB of memory. On a small droplet, 30+ zombies consume ~1-2GB, which caused an OOM kill when we tried to run a new check interactively.
Root Cause
The cron job runs every hour:
5 * * * * /home/ubuntu/doab-check/scripts/doab_check.sh >> .../cron_check.log
But check_items appears to hang on some links (before our timeout fix in PR #5, there were no timeouts on HTTP requests). These hung processes never complete, and the cron spawns a new one each hour regardless.
PR #5 (now deployed) adds timeouts to all HTTP requests, which should prevent new hangs. But the existing zombie processes need to be cleaned up.
Immediate Fix
Kill the accumulated zombies:
pkill -f "manage.py check_items"
Longer-term Fixes
Problem
There are 30+ orphaned
check_itemsprocesses on doab-check.ebookfoundation.org, some dating back to 2025. The hourly cron launches a newcheck_itemsevery hour but old ones never exit.Each process uses 30-65MB of memory. On a small droplet, 30+ zombies consume ~1-2GB, which caused an OOM kill when we tried to run a new check interactively.
Root Cause
The cron job runs every hour:
But
check_itemsappears to hang on some links (before our timeout fix in PR #5, there were no timeouts on HTTP requests). These hung processes never complete, and the cron spawns a new one each hour regardless.PR #5 (now deployed) adds timeouts to all HTTP requests, which should prevent new hangs. But the existing zombie processes need to be cleaned up.
Immediate Fix
Kill the accumulated zombies:
pkill -f "manage.py check_items"Longer-term Fixes
timeout 30mprefix)