Skip to content

Conversation

@tmonty12
Copy link

@tmonty12 tmonty12 commented Nov 14, 2025

What type of PR is this?

/kind documentation

What this PR does / why we need it:

This PR creates a design proposal for allowing configurability over the default Grove RollingUpdate strategy for PodCliqueSets, PodCliqueScalingGroups and PodCliques.

  • Introduced a ReplicaRecreate strategy at the PodCliqueSet level to atomtically recreate PCS replicas in the case where application level version compatibility is not possible. Also introduces the notion of maxUnavailable/maxSurge for ReplicaRecreate
  • For the PCS RollingUpdate strategy, introduces maxSurge/maxUnavailable at the PC and PCSG levels.
  • Includes considerations for the following:
    • Gang Scheduling
    • Indice management when surging
    • Use cases and examples
    • Updates to webhook validation
    • Updates to APIs

- Pod selection: oldest pod first (by creation timestamp)
- Individual pods are deleted and recreated

This default behavior provides safe, conservative updates but lacks user configurability. At the PCSG and standalone PC levels, the update corresponds to maxUnavailable 1 and maxSurge 0 where a singular old replica is deleted and new one is created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PC -> PCLQ

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


type PodCliqueSetUpdateStrategyType string
const (
RollingUpdate // Update replicas sequentially
Copy link
Contributor

@Ronkahn21 Ronkahn21 Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name does not match the description,. it does not update replicas sequentially it update the replica components sequentially, better name should be ReplicaUpdate or somthing that is refering to replica

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, ReplicaUpdate is better but it's still a bit confusing. Not sure what a better name would be though. Unless we can find one, we should document this very well.

@ZYecho11
Copy link

Perhaps we can consider introducing a partiton field to control the upgrade process? The partiton field will be meaningful for the scenario of PD association upgrade

- **Frontend** standalone PC: 3 replicas
- **Prefill** PCSG: 2 replicas (prefill-leader PC: 1 replica, prefill-worker PC: 2 replicas)
- **Decode** PCSG: 2 replicas (decode-leader PC: 1 replica, decode-worker PC: 2 replicas)

Copy link
Contributor

@gflarity gflarity Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest you include the YAML as well, it helps those who "see the matrix" ;)

apiVersion: grove.io/v1alpha1
kind: PodCliqueSet
metadata:
  name: multinode-disaggregated-inference
  namespace: default
spec:
  # 2 PCS replicas - each replica contains the full inference pipeline
  replicas: 2
  template:
    podcliques:
    # Frontend - standalone PodClique (not part of any PCSG)
    - name: frontend
      spec:
        roleName: frontend
        replicas: 3
        podSpec:
          containers:
          - name: frontend
            # ...

    # Prefill Leader - part of prefill PCSG
    - name: prefill-leader
      spec:
        roleName: prefill-leader
        replicas: 1
        podSpec:
          containers:
          - name: prefill-leader
          # ...


    # Prefill Worker - part of prefill PCSG
    - name: prefill-worker
      spec:
        roleName: prefill-worker
        replicas: 2
        podSpec:
          containers:
          - name: prefill-worker
          # ... 

    # Decode Leader - part of decode PCSG
    - name: decode-leader
      spec:
        roleName: decode-leader
        replicas: 1
        podSpec:
          containers:
          - name: decode-leader
          # ... 

    # Decode Worker - part of decode PCSG
    - name: decode-worker
      spec:
        roleName: decode-worker
        replicas: 2
        podSpec:
          containers:
          - name: decode-worker
          # ... 

    # PodCliqueScalingGroups - groups of cliques that scale together
    podCliqueScalingGroups:
    # Prefill PCSG: 2 replicas of (1 leader + 2 workers)
    - name: prefill
      cliqueNames: [prefill-leader, prefill-worker]
      replicas: 2

    # Decode PCSG: 2 replicas of (1 leader + 2 workers)
    - name: decode
      cliqueNames: [decode-leader, decode-worker]
      replicas: 2

Copy link
Contributor

@gflarity gflarity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the design. See comments. Ping me, happy to talk through any of them in a video chat.

Recommendation

Given that we live in a world supply constraints around GPUs, I doubt there's going to be clusters out there with spare capacity to use for maxSurge. So I'd recommend we narrow the scope of this document to just cover replicaRecreate and maxUnavailable. I suspect those are the two knobs the real world actually cares about. Should maxSurge get requested by an organization with real use cases for it, we can dig in and understand why and if the approach of scale-then-roll is sufficient, as that seems a lot cleaner though slower.

Consider a multinode aggregated inference serving deployment with 2 PCS replicas, where each replica contains:

- **Aggregated Workers** PCSG: 3 replicas (each replica includes inference workers with a frontend capable of tokenization that accepts OpenAI Chat Completion requests)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think your example would like the following.

Sorry to nitpick, I know this is a toy example, but in this situation you could also just 1 replica os 6 pcsg right? Or 6 pcs with 1 pcsg. Which makes the most sense?

  apiVersion: grove.io/v1alpha1
  kind: PodCliqueSet
  metadata:
    name: multinode-aggregated-inference
    namespace: default
  spec:
    # 2 PCS replicas - each replica contains the full aggregated inference deployment
    replicas: 2
    template:
      podcliques:
      # Aggregated Worker - combines inference + frontend/tokenization in a single pod
      # Each worker can handle OpenAI Chat Completion requests directly
      - name: aggregated-worker
        spec:
          roleName: aggregated-worker
          replicas: 1
          podSpec: #... 

      # PodCliqueScalingGroup - groups the aggregated workers that scale together
      podCliqueScalingGroups:
      # Aggregated Workers PCSG: 3 replicas of aggregated inference workers
      - name: aggregated-workers
        cliqueNames: [aggregated-worker]
        replicas: 3

```
PodCliqueSet (Top Level)
├─ UpdateStrategy (controls PCS replica updates)
│ ├─ RollingUpdate: one replica at a time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this. Even if we were to ReplicaRecreate, you'd still do it one at time as there's no maxSurge or maxUnavailable on PCS?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see below there is? This diagram is bit confusing.


type PodCliqueSetUpdateStrategyType string
const (
RollingUpdate // Update replicas sequentially
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, ReplicaUpdate is better but it's still a bit confusing. Not sure what a better name would be though. Unless we can find one, we should document this very well.

Controls **how pods update within a standalone PodClique**.

```go
type ComponentUpdateStrategy struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the Component is necessary here. Just call it UpdateStrategy. Though if recreate is set on the PCS, then this doesn't matter and we probably need to warn the user.


## Update Behavior

### RollingUpdate (Default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReplicaUpdate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually another way to Frame it DelegateChildren or something like that. IE follow the strategy of the children vs replica recreate which ignores their strategies and just recreates.

- Need to clear all state at once within a replica
- Coordinated recreation of interdependent components to prevent cross-version communication issues

## MaxSurge Considerations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this section a bit hard to follow. Leaving some suggestions below to make it a bit easier.

Comment on lines +234 to +365
## MaxSurge Considerations

### PodClique MaxSurge

**Indexing Strategy:**

PodClique uses an index tracker that extracts pod indices from hostnames and fills holes automatically. When surge pods are created:

1. **Surge pods get indices above replica count**: With `replicas=3` and `maxSurge=1`, surge pod gets index 3 (or higher if holes exist)
2. **Index tracker fills holes**: When old pods are deleted, their indices become available. The tracker fills holes from lowest to highest (starting from 0)
3. **No holes at end of update**: As old pods are deleted and recreated, new pods fill the lowest available indices, ensuring sequential indices `[0, replicas-1]` at completion

**Example with `replicas=3`, `maxSurge=1`, `maxUnavailable=0`:**

1. **Initial:** Pods with indices 0, 1, 2 (old spec)
2. **Create surge:** Pod with index 3 (surge, new spec) - now have [0, 1, 2, 3]
3. **Delete pod 0:** Index 0 becomes available
4. **Recreate pod 0:** New pod fills index 0 (new spec) - now have [0, 1, 2, 3]
5. **Delete pod 1:** Index 1 becomes available
6. **Recreate pod 1:** New pod fills index 1 (new spec) - now have [0, 1, 2, 3]
7. **Delete pod 2:** Index 2 becomes available
8. **Recreate pod 2:** New pod fills index 2 (new spec) - now have [0, 1, 2, 3]
9. **Delete surge pod 3:** Final state [0, 1, 2] - no holes

**Gang Scheduling Impact:**

- Surge pods are added to the same PodGroup as existing pods
- PodGroup's `PodReferences` list includes all pods (old + surge)
- Gang scheduling requires the PodGroup to meet `MinReplicas` (from PodClique's `MinAvailable`)
- All pods in the PodGroup (including surge) must be scheduled together as part of the gang
- If surge pod cannot be scheduled, the entire gang is blocked

**PodGang/PodGroup Construction:**

- PodGroup contains pod references from the PodClique
- During surge, PodGroup temporarily has more pod references than `replicas` count
- PodGroup's `MinReplicas` is set to PodClique's `MinAvailable` (not affected by surge)
- Gang scheduling ensures at least `MinReplicas` pods are scheduled together

**Stuck Scenarios:**

- **Surge pod cannot be scheduled**: Gang scheduling blocks until surge pod can be scheduled, update stuck
- **Surge pod scheduled but not ready**: Update cannot proceed if `maxUnavailable=0` requires surge pod to be ready before deleting old pods

### PodCliqueScalingGroup MaxSurge

**Indexing Strategy:**

PodCliqueScalingGroup replicas use replica indices (0, 1, 2, ...). When surge replicas are created:

1. **Surge replicas get indices above replica count**: With `replicas=3` and `maxSurge=1`, surge replica gets index 3
2. **Replica placement depends on minAvailable**:
- If `replicas <= minAvailable`: All replicas (including surge) go into base PodGang
- If `replicas > minAvailable`: Surge replica goes into scaled PodGang
3. **No holes at end of update**: Original replica indices `[0, replicas-1]` are maintained, surge replicas at `[replicas, replicas+maxSurge-1]` are deleted after update completes

**Example with `replicas=3`, `minAvailable=3`, `maxSurge=1`, `maxUnavailable=0`:**

1. **Initial:** Replicas 0, 1, 2 in base PodGang (old spec)
2. **Create surge:** Replica 3 in scaled PodGang (surge, new spec) - replicas 0, 1, 2 in base PodGang; replica 3 in scaled PodGang
3. **Wait for surge available:** Replica 3 becomes available (scaled PodGang gated by base PodGang readiness)
4. **Delete and recreate replica 0:** Replica 0 (new spec) in base PodGang
5. **Wait for replica 0 available:** Replica 0 becomes available
6. **Delete and recreate replica 1:** Replica 1 (new spec) in base PodGang
7. **Wait for replica 1 available:** Replica 1 becomes available
8. **Delete and recreate replica 2:** Replica 2 (new spec) in base PodGang
9. **Wait for replica 2 available:** Replica 2 becomes available
10. **Delete surge replica 3:** Final state replicas [0, 1, 2] - no holes

**Example with `replicas=3`, `minAvailable=2`, `maxSurge=1`:**

1. **Initial:** Replicas 0, 1 in base PodGang; Replica 2 in scaled PodGang (old spec)
2. **Create surge:** Replica 3 in scaled PodGang (surge, new spec)
3. **Update proceeds:** Replicas 0, 1, 2 updated, then surge replica 3 deleted

**Gang Scheduling Impact:**

- **Base PodGang (replicas 0 to minAvailable-1)**: All PodGroups in base PodGang must meet `MinReplicas` for gang scheduling to proceed.
- **Scaled PodGangs (replicas >= minAvailable)**: Surge replicas (always at indices >= replicas, which is >= minAvailable) get their own scaled PodGang. Scaled PodGangs are gated by base PodGang readiness - gates are removed only after base PodGang is ready.
- **Gang scheduling constraints**: Each PodGroup (one per PodClique in the PCSG replica) must meet its `MinReplicas` for the gang to be scheduled.

**PodGang/PodGroup Construction:**

- **Base PodGang**: Contains PodGroups for replicas 0 to `minAvailable-1`.
- **Scaled PodGangs**: Each replica >= `minAvailable` gets its own scaled PodGang. Surge replicas (always at indices >= replicas >= minAvailable) create new scaled PodGangs.
- **PodGroup per PodClique**: Each PodClique in a PCSG replica becomes a PodGroup. Surge replica creates PodGroups for all its PodCliques.

**Stuck Scenarios:**

- **Surge replica cannot be scheduled**: Surge replica is always in a scaled PodGang. If scaled PodGang is blocked and base PodGang is updating, creates circular dependency
- **Base PodGang update blocks surge scaled PodGang**: Surge replica in scaled PodGang is gated by base PodGang readiness. If base is updating, surge cannot proceed.
- **Surge replica scheduled but not ready**: Update cannot proceed if `maxUnavailable=0` requires surge replica to be available before deleting old replicas.

### PCS Replica-Level MaxSurge with ReplicaRecreate

**Behavior:**

With ReplicaRecreate, surge replicas are created at new indices above the desired replica count to avoid index holes. The update process:

1. Creates surge replicas at indices `[replicas, replicas+maxSurge-1]`
2. Recreates original indices `[0, replicas-1]` with the updated spec
3. Deletes surge replicas once original indices are recreated

**Example:**

With `replicas=3`, `maxSurge=1`, and `maxUnavailable=0`:

1. **Initial state:** Replicas 0, 1, 2 (old spec)
2. **Create surge replica:** Replicas 0, 1, 2 (old), 3 (surge, new spec)
3. **Wait for surge available:** Replica 3 becomes available
4. **Delete and recreate replica 0:** Replicas 0 (new), 1, 2 (old), 3 (surge, new)
5. **Wait for replica 0 available:** Replica 0 becomes available
6. **Delete and recreate replica 1:** Replicas 0, 1 (new), 2 (old), 3 (surge, new)
7. **Wait for replica 1 available:** Replica 1 becomes available
8. **Delete and recreate replica 2:** Replicas 0, 1, 2 (new), 3 (surge, new)
9. **Wait for replica 2 available:** Replica 2 becomes available
10. **Delete surge replica 3:** Replicas 0, 1, 2 (new spec) - no index holes

This approach maintains sequential indices throughout the update, avoiding DNS naming issues and ensuring applications always see consistent replica indices. With `maxUnavailable=0`, a surge replica must be available before deleting any original replica to maintain full capacity.

**Stuck Scenarios with ReplicaRecreate and MaxSurge:**

When using ReplicaRecreate with `maxSurge > 0`, the update can get stuck if surge replicas fail to become available. This can happen in several scenarios:

1. **Surge replica is unscheduled**: The surge replica's pods cannot be scheduled due to insufficient cluster resources, topology constraints that cannot be satisfied, node selectors/affinity mismatches, or resource quotas exceeded.

2. **Surge replica has MinAvailable breached**: The surge replica's pods are scheduled but fail to become ready due to crash loops, health check failures, application startup failures, or dependency issues.

3. **Existing replicas are unhealthy**: Even if surge replica is healthy, if existing replicas are unscheduled or have MinAvailable breached, the update may be blocked by `maxUnavailable` constraints.

Users are responsible for identifying when a rolling update with `maxSurge` during ReplicaRecreate is stuck (e.g., update progress stalls, surge replica remains unscheduled or has MinAvailable breached) and manually intervening to unblock the update, such as by reducing `maxSurge` to 0 or deleting the stuck surge replica.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this section a bit confusing. I worked with Opus 4.5 to make it a bit clearer. Please take a look and incorporate what you like.

# MaxSurge Considerations

## Overview

Enabling `maxSurge` changes the update strategy from **delete-then-create** (current default) to **create-then-delete**. This maintains full capacity during updates but introduces complexity around indexing, gang scheduling, and potential "stuck" scenarios.

**Key Risk**: With `maxSurge > 0`, updates can become **stuck** if surge resources fail to schedule or become healthy. Grove does not automatically detect or resolve these situations—users must monitor and manually intervene.


## Common Concepts Across All Levels

Before diving into level-specific details, here are concepts that apply at every level:

**Indexing Strategy**: Surge resources are assigned indices *above* the normal replica count to avoid index collisions. When old resources are deleted and recreated, they reclaim their original indices. Surge resources are deleted at the end of the update, leaving clean sequential indices.

**Availability Gating**: When `maxUnavailable=0`, the surge resource must become available *before* any old resource can be deleted. This is what enables zero-downtime updates but also creates the primary stuck scenario.


## PodClique MaxSurge (Pod Level)

**Scope**: Controls how individual pods update within a standalone PodClique.

**How It Works**:

- With `replicas=3` and `maxSurge=1`: surge pod gets index 3, update proceeds, surge pod deleted at end
- The index tracker fills holes from lowest to highest, ensuring no index gaps at completion

**Example with `replicas=3`, `maxSurge=1`, `maxUnavailable=0`:**

1. **Initial:** Pods with indices 0, 1, 2 (old spec)
2. **Create surge:** Pod with index 3 (new spec) — now have [0, 1, 2, 3]
3. **Delete pod 0:** Index 0 becomes available
4. **Recreate pod 0:** New pod fills index 0 (new spec) — now have [0, 1, 2, 3]
5. **Repeat for pods 1 and 2**
6. **Delete surge pod 3:** Final state [0, 1, 2] with new spec — no holes

**Gang Scheduling Impact**:

Surge pods are added to the **same PodGroup** as existing pods. This has an important implication:

> **Gang scheduling requires ALL pods in a PodGroup (including surge) to be schedulable together.**

If the cluster lacks resources for the surge pod, the entire gang becomes unschedulable, blocking the update.

**Stuck Scenarios**:

| Scenario | Cause | Result |
|----------|-------|--------|
| Surge pod unschedulable | Insufficient cluster resources | Gang blocked, update stuck |
| Surge pod not ready | Container failures, health check issues | Update blocked (if `maxUnavailable=0`) |


## PodCliqueScalingGroup MaxSurge (PCSG Replica Level)

**Scope**: Controls how PCSG replicas (groups of related PodCliques) update within a scaling group.

**How It Works**:

- With `replicas=3` and `maxSurge=1`: surge PCSG replica gets index 3
- Each PCSG replica contains multiple PodCliques that are updated together

**Example with `replicas=3`, `minAvailable=3`, `maxSurge=1`, `maxUnavailable=0`:**

1. **Initial:** Replicas 0, 1, 2 in base PodGang (old spec)
2. **Create surge:** Replica 3 in scaled PodGang (new spec)
3. **Wait for surge available:** Replica 3 becomes available
4. **Delete and recreate replicas 0, 1, 2** sequentially, waiting for each to become available
5. **Delete surge replica 3:** Final state replicas [0, 1, 2] with new spec

**Gang Scheduling Impact — The Base/Scaled PodGang Problem**:

This is where `maxSurge` becomes complicated. Grove uses a two-tier gang scheduling model:

- **Base PodGang**: Contains PCSG replicas 0 through `minAvailable-1`
- **Scaled PodGangs**: Contain PCSG replicas at index `minAvailable` and above

Since surge replicas are always at index ≥ `replicas` (which is ≥ `minAvailable`), **surge replicas always land in Scaled PodGangs**.

Scaled PodGangs have a dependency: they are **gated until the base PodGang is ready**. This creates a potential problem:

\`\`\`
┌─────────────────────────────────────────────────────────────────┐
│ POTENTIAL CIRCULAR DEPENDENCY                                   │
│                                                                 │
│ 1. Surge replica (in scaled PodGang) waits for base to be ready │
│ 2. Base PodGang is being updated (may not be "ready")           │
│ 3. Update needs surge to be available before deleting old base  │
│ 4. Deadlock: surge waits for base, update waits for surge       │
└─────────────────────────────────────────────────────────────────┘
\`\`\`

**Stuck Scenarios**:

| Scenario | Cause | Result |
|----------|-------|--------|
| Surge blocked by base PodGang | Base PodGang updating, not "ready" | Circular dependency, update stuck |
| Surge replica unschedulable | Resource constraints on scaled PodGang | Update stuck |
| Surge replica not available | Pod failures within the surge PCSG replica | Update blocked (if `maxUnavailable=0`) |
\```

## PCS Replica MaxSurge with ReplicaRecreate (Top Level)

**Scope**: Controls how entire PCS replicas are recreated during version-incompatible updates.

**How It Works**:

With `replicas=2`, `maxSurge=1`, and `maxUnavailable=0`:

Step 1: [0-old, 1-old] Initial state
Step 2: [0-old, 1-old, 2-surge] Create surge replica at index 2
Step 3: Wait for replica 2 to become available
Step 4: [0-new, 1-old, 2-surge] Delete/recreate replica 0
Step 5: Wait for replica 0 to become available
Step 6: [0-new, 1-new, 2-surge] Delete/recreate replica 1
Step 7: Wait for replica 1 to become available
Step 8: [0-new, 1-new] Delete surge replica 2


This approach maintains full capacity (2 available replicas) throughout the update.

**Stuck Scenarios**:

Since surge PCS replicas are complete deployments (with their own PodGangs, PCSGs, and PodCliques), they can fail to become available for many reasons:

| Scenario | Examples |
|----------|----------|
| **Surge replica unscheduled** | Insufficient resources, topology constraints unsatisfiable, node selector mismatches, quota exceeded |
| **Surge replica unhealthy** | Container crash loops, health check failures, application startup failures, dependency issues |
| **Existing replicas degraded** | If `maxUnavailable` constraint prevents progress due to unhealthy existing replicas |


## User Responsibilities

**Grove does not automatically recover from stuck surge scenarios.** Users are responsible for:

1. **Monitoring update progress** — Watch for updates that stall (surge replica remains unscheduled or unhealthy)
2. **Diagnosing the cause** — Check pod events, resource availability, and PodGang status
3. **Manual intervention** — Options include:
   - Reducing `maxSurge` to 0 to switch to delete-then-create
   - Manually deleting the stuck surge replica
   - Freeing cluster resources to allow scheduling
   - Fixing application issues preventing readiness

## Summary

| Level | Surge Resource | Primary Risk | Gang Scheduling Concern |
|-------|---------------|--------------|------------------------|
| **PodClique** | Extra pod | Pod unschedulable | Surge pod blocks entire PodGroup gang |
| **PCSG** | Extra PCSG replica | Scaled PodGang gated | Base/scaled dependency creates circular wait |
| **PCS (ReplicaRecreate)** | Extra PCS replica | Replica unhealthy/unscheduled | Full replica must schedule and become healthy |

**Bottom Line**: `maxSurge` enables zero-downtime, full-capacity updates but shifts the failure mode from "reduced capacity during update" to "potentially stuck update requiring manual intervention."


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check out the draft PR for rolling update E2E tests, seems like scaling during a roll is an edge case we care about current. What are the impacts of this on max surge?

What if we just increase the replicas first (with the old revision), then treat this as a max unavailable situation once pods are ready. Once the rollout is successful, reduce the replicas again. This might handle the intermingling mentioned above better. I suspect only really having to implement the maxUnavailable case would make the implementation a lot simpler too. The down side is you'll have to wait for new old pods to spin up, which could take quite a long time depending on what's being launched. That time would be wasted, but that might be a better trade off than all the stuck cases.


### RollingUpdate (Default)

The default RollingUpdate behavior is described in the [Motivation](#motivation) section. When using the RollingUpdate strategy, `maxUnavailable` and `maxSurge` settings at the PodCliqueSet level are invalid and will be rejected by the validation webhook - PCS replicas are always updated one at a time sequentially. However, PC and PCSG `updateStrategy` settings (maxUnavailable/maxSurge) are observed and control how pods and PCSG replicas update within each PCS replica.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The default RollingUpdate behavior is described in the [Motivation](#motivation) section. When using the RollingUpdate strategy, `maxUnavailable` and `maxSurge` settings at the PodCliqueSet level are invalid and will be rejected by the validation webhook - PCS replicas are always updated one at a time sequentially. However, PC and PCSG `updateStrategy` settings (maxUnavailable/maxSurge) are observed and control how pods and PCSG replicas update within each PCS replica.
The default ReplicaUpdate behavior is described in the [Motivation](#motivation) section. When using the ReplicaUpdate strategy, `maxUnavailable` and `maxSurge` settings at the PodCliqueSet level are invalid and will be rejected by the validation webhook - PCS replicas are always updated one at a time sequentially. However, PC and PCSG `updateStrategy` settings (maxUnavailable/maxSurge) are observed and control how pods and PCSG replicas update within each PCS replica.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants