Skip to content

Restore celery beat/worker observability — redefine loggers in LOGGING dict (and fix CELERYBEAT_OPTS quoting) #1139

@rdhyee

Description

@rdhyee

Problem

Celery's internal loggers (celery.beat, celery.worker) are silenced on prod because regluit/settings/common.py sets:

LOGGING = {
    'version': 1,
    'disable_existing_loggers': True,
    ...
}

…and the loggers block doesn't redefine them. Result: /var/log/celery/beat.log has been empty since 2024-09-25 even though beat is firing ~4,400 tasks/day (verified via w1.log on 2026-04-30). The misleading silence in beat.log triggered a false-alarm investigation under #1138 — losing time and shaking confidence in the operational picture.

Why fix this

  • Beat liveness becomes observable at the system that's actually responsible for it (its own log), instead of having to grep the worker log for indirect evidence
  • Future failures will surface promptly rather than hiding behind a quirk of logging config
  • Cheap, low-risk: pure logging config change, doesn't touch beat/worker behavior

Proposed change

In regluit/settings/common.py, add explicit logger entries:

LOGGING = {
    ...
    'loggers': {
        ...existing...
        'celery': {
            'handlers': ['file'],   # or a dedicated celery handler
            'level': 'INFO',
            'propagate': False,
        },
        'celery.beat': {
            'handlers': ['file'],
            'level': 'INFO',
            'propagate': False,
        },
        'celery.worker': {
            'handlers': ['file'],
            'level': 'INFO',
            'propagate': False,
        },
    },
}

Verify by tailing beat.log after a deploy + service restart — should see scheduler tick lines within the first max_interval (default 5 min) and Sending due task entries when jobs fire.

Bundled cleanup: CELERYBEAT_OPTS quoting bug

While we're touching the celery config, fix this in EbookFoundation/regluit-provisioning (/etc/default/celerybeat):

CELERYBEAT_OPTS="--schedule=/var/run/celery/celerybeat-schedule --concurrency=2"

Two issues:

  1. systemd's ExecStart expands this inside double-quotes, so the whole string is passed as one argument — schedule file ends up literally named celerybeat-schedule --concurrency=2 (with embedded spaces). Cosmetic but ugly.
  2. --concurrency=2 is a worker flag, not a beat flag — it's silently ignored.

Fix: drop the --concurrency=2 and let systemd pass --schedule as a single clean arg, or split CELERYBEAT_OPTS into separate vars and unquote in the unit.

Liveness watchdog (optional follow-on)

Once beat.log is informative again, the watchdog from #1138 becomes simpler:

# Alert if beat.log hasn't been written to in >10 min
[ $(($(date +%s) - $(stat -c %Y /var/log/celery/beat.log))) -lt 600 ] && echo OK || echo STALE

Worth adding as a host-level cron once observability is restored.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions