-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Labels
Description
As of #422 , the broker uses asynchronous operations when creating, modifying, and deleting RDS instances. The broker also uses an asynchronous operation to delete Elasticsearch instances.
The problem with asynchronous operations in goroutines is that they can be terminated when the application unexpectedly crashes/restarts, leaving the brokered resources in an unpredictable state.
To mitigate the risk of asynchronous job failures, we should implement a real job queueing system that can restart incomplete jobs in case of unexpected app crashes/restarts.
To do
- Explore options for a job queueing system that will restart jobs on application failure
- River is worth exploring, since it is backed by PostgreSQL which we are already using: https://riverqueue.com/
- Implement job queueing system
- Test that jobs are not lost on restart