Open
Description
What would you improve?
Currently, if a restart is necessary we restart the container at a low workload time.
This work but isn't enough for enuseres.
An option to optimize the logic would be:
- A restart note displayed in the app ("App will be restarted in XX min. Ensure that your progress is saved")
- Stop all running containers
- Start all containers
- Set all jobs that are listed as RUNNING in db as ERROR/FAILED for jobs that user can rerun (e.g. heuristic runs / embedding creation)
- Restart jobs that can't be rerun by users (e.g. tokenization)