Skip to content

Task executors that support specific roles are restarted when they fail #620

@zuston

Description

@zuston

Why

Now TonY introduces the Sidecar Tensorboard, but sometimes it will fail due to hardware problems and unstable HDFS. But for users, it's better to unconscious restart it.

So we need to introduce the general mechanism to meet above requirements

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions