Skip to content

[Spot] An option for keeping failed spot job for a while before termination #1163

Open
@Michaelvll

Description

@Michaelvll

When something wrong happens with the spot job, it would be nice to be able to log into the spot cluster to take a look at the problem. As proposed by @lhqing, having an option like --keep-minutes-after-error 60 for the spot launch can be useful for debugging.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions