+Being able to see log events and metrics associated with these tasks allowed us to fairly quickly spot a couple of problems - firstly that some jobs were queuing up causing longer-than-usual wait times for users, and secondly a long-running job that would occasionally fail overnight, causing it to be retried by [Sidekiq](https://sidekiq.org/) and finish later than expected. We ended the week starting to work on fixes for these issues - in particular by leaning on the principle of making background jobs as simple as they can be (even if that looks "inefficient") and [relying on Sidekiq to handle horizontal scaling](https://github.com/sidekiq/sidekiq/wiki/Best-Practices#3-embrace-concurrency).
0 commit comments