Terminate failed repair jobs

Right now we don't terminate failed repair jobs by default - the problem is that they might have failed because of a timeout on our side and in fact still be running. This causes two problems:
- in case of a timeout, SM believes that the task has failed and stopped running, so it schedules new repair jobs for "released" hosts. This can break the 1 job per 1 host rule.
- not terminated repair jobs running after SM task has ended might make it impossible to retry the SM task until they are finished (see https://github.com/scylladb/scylla-enterprise/issues/4055)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terminate failed repair jobs #3806

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Terminate failed repair jobs #3806

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions