Skip to content

Ability to archive/re-instantiate workspaces #4307

@TonyWildish-BH

Description

When a project comes to an end, we want to turn off (destroy) a workspace to stop paying for it. However, we want to archive the data and other resources so they remain available in case a researcher needs to go back to verify something. There can be a few reasons for this:

  • Some of our projects want the SDE so they can crunch input data which they then export for external analysis. At that point, the w/s is just sitting there costing money, so it would be good to destroy it. However, downstream analysis may reveal something that needs closer investigation. Unless the user has been diligent in recording their activity (!), they may have trouble reproducing their environment. Ideally, we could archive it in some minimal-cost manner and bring it back to life later.
  • We may also be required to keep artifacts for regulatory purposes. In fact, that's very likely, depending on the nature and origin of the project.

Resources that we might want to archive include:

  • Shared storage
  • Any SQL/NoSQL databases
  • Any Gitea repositories they've created
  • (possibly) VM images, if they've been customised for analysis.
  • AzureML models etc?

We would not need to preserve the user access, the actual workspace itself, or anything else. Just code & data.

Has anyone put any thought into this? We wouldn't need a complex solution, something as simple as dumping DB/git/image archives to disk could suffice, reducing it to a problem of archiving the storage. As long as there's a feasible route for resurrection, that's enough, it doesn't have to be overly user-friendly.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions