Open
Description
When a project comes to an end, we want to turn off (destroy) a workspace to stop paying for it. However, we want to archive the data and other resources so they remain available in case a researcher needs to go back to verify something. There can be a few reasons for this:
- Some of our projects want the SDE so they can crunch input data which they then export for external analysis. At that point, the w/s is just sitting there costing money, so it would be good to destroy it. However, downstream analysis may reveal something that needs closer investigation. Unless the user has been diligent in recording their activity (!), they may have trouble reproducing their environment. Ideally, we could archive it in some minimal-cost manner and bring it back to life later.
- We may also be required to keep artifacts for regulatory purposes. In fact, that's very likely, depending on the nature and origin of the project.
Resources that we might want to archive include:
- Shared storage
- Any SQL/NoSQL databases
- Any Gitea repositories they've created
- (possibly) VM images, if they've been customised for analysis.
- AzureML models etc?
We would not need to preserve the user access, the actual workspace itself, or anything else. Just code & data.
Has anyone put any thought into this? We wouldn't need a complex solution, something as simple as dumping DB/git/image archives to disk could suffice, reducing it to a problem of archiving the storage. As long as there's a feasible route for resurrection, that's enough, it doesn't have to be overly user-friendly.
Activity