Skip to content

Reuse AWS instances #171

Open
Open
@PGijsbers

Description

@PGijsbers

This is a suggestion I found in the TODO:

We can reuse an AWS instance instead of shutting them down after each job.
At least during a single benchmark, we could limit #instances = #parallel jobs.

I am not sure how this would actually work, in principle I see there is a speed up:

  • only have to create the instance once
  • only have to download a dataset once (if subsequent jobs use the same dataset)
  • only have to install the automl framework once (if subsequent jobs use the same framework)

I also see some pitfalls:

  • When re-using an instance for multiple frameworks, the second framework might be affected by leftovers of the installation of the first.
  • To provide the same disk space, does this mean we have to make sure to transfer/delete all (cache) files of the previous run?
  • Does this require additional communication? The most naive way I can imagine of doing this is to just run all folds of a (framework, task) tuple per instance, which should not require additional communication (though some extra clean up, see above). This wouldn't allow perfect re-use but it's probably pretty good.
  • (for the future:) Is there an increased risk of interruption when using spot instances? Is the effect greater? I assume the answer is no to both (afaik interruption is just based on bid-price, and when transferring results between each part of the job (e.g. fold), no more than one is lost), but I am not sure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    awsAWS supportenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions