Skip to content

feat: add resilient background job retry & monitoring#598

Closed
Islam0953 wants to merge 2 commits intorohitdash08:mainfrom
Islam0953:feature/resilient-job-retry
Closed

feat: add resilient background job retry & monitoring#598
Islam0953 wants to merge 2 commits intorohitdash08:mainfrom
Islam0953:feature/resilient-job-retry

Conversation

@Islam0953
Copy link

Summary

  • Exponential backoff retry service with configurable max_retries, base_delay, backoff_factor
  • BackgroundJob model with 6 states: PENDING, RUNNING, SUCCESS, FAILED, RETRYING, DEAD
  • JobHistory model for per-attempt tracking
  • Dead letter queue for permanently failed jobs
  • @resilient_job decorator for retrofitting existing functions
  • Thread-safe job execution
  • Monitoring REST API:
    • GET /jobs/stats — job statistics by status
    • GET /jobs — list jobs with status filter & pagination
    • GET /jobs/<id> — job details with full attempt history
    • GET /jobs/dead-letter — dead letter queue
    • POST /jobs/<id>/retry — manual retry for dead/failed jobs
    • DELETE /jobs/<id> — remove job record
  • Database: background_jobs and job_history tables with indexes

Test plan

  • 17 pytest tests covering:
    • Service: create, success, all retries exhausted, retry-then-succeed, history recording, stats, dead letter queue, decorator pattern
    • API: stats endpoint, list/filter, get with history, manual retry, delete, dead letter endpoint

/claim #130

https://claude.ai/code/session_01K5UYcnS3skK6SKhFz3dcZs

@Dlove123
Copy link

🎯 Claiming this Bounty!

Plan:

  1. Implement resilient background job retry
  2. Add monitoring dashboard
  3. Write comprehensive tests
  4. Submit PR with documentation

💰 Payment Information

Payment: $250 USD
PayPal: 979749654@qq.com
GitHub: Dlove123

Quality Commitment:

  • ✅ Code Review before submission
  • ✅ Unit tests (100% coverage)

Let's build this! ⚙️

@Dlove123
Copy link

💰 Payment Information (补充)

PayPal: 979749654@qq.com
GitHub: Dlove123

⚠️ Payment Terms

  • Payment due within 30 days of PR merge
  • Code rollback on Day 30 if payment not received

@Dlove123
Copy link

💰 Payment Information

PayPal: 979749654@qq.com
GitHub: Dlove123

⚠️ Payment Terms

  • Payment due within 30 days

Dlove123 added a commit to Dlove123/FinMind that referenced this pull request Mar 22, 2026
- Retry logic with max attempts
- Monitoring dashboard
- 2 unit tests (100% pass)
- Code review completed

💰 Payment: PayPal + ETH + SOL + RTC (50)
@Islam0953 Islam0953 force-pushed the feature/resilient-job-retry branch from 4572131 to 2a54563 Compare March 23, 2026 09:17
claude added 2 commits March 23, 2026 09:43
- Service: job_retry module with exponential backoff, configurable
  max_retries/base_delay/backoff_factor, thread-safe execution
- Models: BackgroundJob (6 states: pending/running/success/failed/retrying/dead)
  and JobHistory for per-attempt tracking
- Dead letter queue for permanently failed jobs
- Decorator pattern (@resilient_job) for retrofitting existing functions
- Monitoring API: /jobs/stats, /jobs (list+filter), /jobs/<id> (with history),
  /jobs/dead-letter, /jobs/<id>/retry (manual), /jobs/<id> DELETE
- Database: background_jobs and job_history tables with indexes
- Tests: 17 tests covering service (create, success, retry exhaustion,
  retry-then-succeed, history, stats, dead letter, decorator) and API
  (stats, list, filter, get, retry, delete, history)

Resolves rohitdash08#130

/claim rohitdash08#130
…nd expanded tests

- Validate and sanitize job names (reject empty, trim, truncate to 200 chars)
- Clamp max_retries to [1, 20], base_delay to [0, 300s], backoff_factor to [1, 10]
- Add max_delay parameter to cap exponential backoff and prevent unbounded sleeps
- Truncate result (10k) and error (5k) fields to prevent unbounded DB storage
- Rename _job_to_dict to job_to_dict (public API, was imported across modules)
- Remove unused threading.Lock and traceback imports
- Add status validation on list endpoint (reject invalid filter values)
- Record MANUAL_RETRY history entry on manual retry for audit trail
- Add logging on job deletion
- Expose backoff_factor and max_delay on @resilient_job decorator
- Add docstrings to all public functions
- Expand test suite from 18 to 41 tests: input clamping, truncation, args
  passthrough, history on success, dead-letter limits, retry eligibility,
  invalid status filter, pagination, unauthenticated access, and more
- Replace global mutable _call_count with closure-based test helper
@Islam0953 Islam0953 force-pushed the feature/resilient-job-retry branch from a82035c to c6dc968 Compare March 23, 2026 09:43
@Islam0953
Copy link
Author

Closing - no longer pursuing this contribution.

@Islam0953 Islam0953 closed this Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants