Skip to content

Cluster failover does not work as expected with MultiplePodTemplateScheduling #7065

@mszacillo

Description

@mszacillo

What happened:

We were testing our internal fork which was recently rebased on top of the 1.16 release. As part of testing we did some cluster failovers with MultiplePodTemplatesScheduling feature set to true, but noticed that failover does not work as expected. Once workloads are evicted from their cluster, they do not get rescheduled.

The root cause seems to be due to the existing implementation of IsBindingReplicasChanged, which only takes spec.replicas into account. We need to also account for components.

What you expected to happen:

Scheduler should correctly detect when replicas have been scaled up or down when using multiple component scheduling.

How to reproduce it (as minimally and precisely as possible):

Attempt a cluster failover for a workload that has multiple components with the MultiplePodTemplatesScheduling feature set to true. This will be reproducible every time.

Metadata

Metadata

Assignees

Labels

kind/bugCategorizes issue or PR as related to a bug.

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions