feat: resilient background job retry with exponential backoff and monitoring by promisingcoder · Pull Request #592 · rohitdash08/FinMind

promisingcoder · 2026-03-21T06:12:04Z

Summary

Implements resilient background job retry & monitoring for async job execution.

/claim #130

Changes

Retry Mechanism (Exponential Backoff)

Configurable via JOB_MAX_RETRIES (default 3) and JOB_RETRY_DELAYS (default "5,15,45" minutes)
Backoff: 5min → 15min → 45min, permanently failed after max retries

Job State Tracking

Added retry_count, last_error, next_retry_at, failed, retry_status to Reminder model
Backward-compatible migration with ADD COLUMN IF NOT EXISTS

Pure Dispatch Function

dispatch_reminders(candidates, sender_func, now) — zero DB coupling, fully testable
Separate run_dispatch_cycle() wrapper for DB operations

Monitoring Endpoints

GET /jobs/status — scheduler health (no auth)
GET /jobs/reminders/stats — counts by status (JWT)
POST /jobs/reminders/run — manual trigger (JWT)

Tests

21 new tests (43 total), all passing
Covers: backoff logic, dispatch success/failure/max-retries, endpoints, auth

Documentation

README updated with retry system docs, endpoint reference, env var configuration

Fixes #130

… monitoring endpoints - Add retry state columns to Reminder model: retry_count, last_error, next_retry_at, failed, retry_status; include PostgreSQL ALTER TABLE compatibility patches in _ensure_schema_compatibility - Implement dispatch_reminders() in app/services/jobs.py: queries due reminders, attempts send_reminder(), and schedules up to 3 retries with 5/15/45-minute exponential backoff; permanently marks failed after max retries are exhausted - Wire APScheduler into create_app with a 1-minute interval job; suppress scheduler startup when FLASK_ENV=testing or TESTING=True - Add /jobs blueprint with GET /jobs/status (scheduler state), GET /jobs/reminders/stats (JWT, aggregate counts), POST /jobs/reminders/run (JWT, on-demand dispatch trigger) - Add 21 comprehensive tests in test_jobs.py covering success, skip, backoff delays, max-retry failure, exception capture, endpoint auth, and stats aggregation - Add _FakeRedis in-memory stub to conftest.py (autouse) so all tests run without a live Redis server; fixes pre-existing Redis-related test failures across the suite (42 → 43 passing) - Update README with retry schedule table, column docs, and monitoring endpoint reference Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…vars JOB_MAX_RETRIES (default 3) and JOB_RETRY_DELAYS (default '5,15,45') can now be set as environment variables instead of hardcoded constants. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- dispatch_reminders(candidates, sender_func, now) is now a pure function with no DB calls — only retry/backoff logic - Added run_dispatch_cycle() wrapper that handles DB fetch + commit - Updated routes to call run_dispatch_cycle - Tests refactored: dispatch logic tests use mock objects directly, DB filtering tests use run_dispatch_cycle - All 43 tests pass

promisingcoder and others added 4 commits March 20, 2026 22:22

feat: make MAX_RETRIES and RETRY_DELAYS_MINUTES configurable via env …

27d69e1

…vars JOB_MAX_RETRIES (default 3) and JOB_RETRY_DELAYS (default '5,15,45') can now be set as environment variables instead of hardcoded constants. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: add env var configuration to README

1a3626c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

promisingcoder requested a review from rohitdash08 as a code owner March 21, 2026 06:12

algora-pbc bot added the 🙋 Bounty claim label Mar 21, 2026

algora-pbc bot mentioned this pull request Mar 21, 2026

Resilient background job retry & monitoring #130

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: resilient background job retry with exponential backoff and monitoring#592

feat: resilient background job retry with exponential backoff and monitoring#592
promisingcoder wants to merge 4 commits intorohitdash08:mainfrom
promisingcoder:bounty/130-resilient-jobs

promisingcoder commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

promisingcoder commented Mar 21, 2026

Summary

Changes

Retry Mechanism (Exponential Backoff)

Job State Tracking

Pure Dispatch Function

Monitoring Endpoints

Tests

Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant