These are some ideas that we have discussed in the past to improve the processing of background jobs:
- Evaluate Sidekiq as a replacement for Resque.
- Implement a way to process at least a small % of non-priority jobs even when there are priority jobs pending.
- Break up big report jobs into smaller ones.
- Improve the way we reschedule failed jobs to avoid growing queues under stress load. The problem with this is that when an incident happens, the queues start growing, and the priority one in particular grows very fast. Shoving more jobs on top of a growing queue is not a good idea, as the system clearly can't cope with the workload