[Enhancement]: Reduce Latency to Create a new PVC when old PVC is removed during a manual rolling update

### Related problem

Kafka seems to be waiting until the end of its 5m timeout to validate that the pending pod that it tried to manually roll did not go ready. This introduces a lot of additional latency in the reconciliation when we know that the pod will never go Ready since it is referencing a PVC that doesn't exist. See logs from rolling one of these pods here

```
2024-02-15 06:36:30 INFO  AbstractOperator:265 - Reconciliation #123(watch) Kafka(kafka/cluster-13000): Kafka cluster-13000 will be checked for creation or modification
2024-02-15 06:36:30 WARN  AbstractOperator:557 - Reconciliation #102(timer) Kafka(kafka/cluster-13000): Failed to reconcile
io.strimzi.operator.common.operator.resource.TimeoutException: Exceeded timeout of 300000ms while waiting for Pods resource cluster-13000-zookeeper-0 in namespace kafka to be ready
	at io.strimzi.operator.common.VertxUtil$1.lambda$handle$1(VertxUtil.java:126) ~[io.strimzi.operator-common-0.39.0.jar:0.39.0]
	at io.vertx.core.impl.future.FutureImpl$4.onFailure(FutureImpl.java:188) ~[io.vertx.vertx-core-4.5.0.jar:4.5.0]
	at io.vertx.core.impl.future.FutureBase.lambda$emitFailure$1(FutureBase.java:75) ~[io.vertx.vertx-core-4.5.0.jar:4.5.0]
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) ~[io.netty.netty-common-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) ~[io.netty.netty-common-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) ~[io.netty.netty-common-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569) ~[io.netty.netty-transport-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[io.netty.netty-common-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty.netty-common-4.1.100.Final.jar:4.1.100.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty.netty-common-4.1.100.Final.jar:4.1.100.Final]
	at java.lang.Thread.run(Thread.java:840) ~[?:?]
2024-02-15 06:36:30 WARN  ZookeeperLeaderFinder:248 - Reconciliation #123(watch) Kafka(kafka/cluster-13000): ZK cluster-13000-zookeeper-0.cluster-13000-zookeeper-nodes.kafka.svc.cluster.local:2181: failed to connect to zookeeper:
```

### Suggested solution

Ideally, we could short-circuit this operation. Similar to initial pod startup, it would be nice if, when we manually rolled pods, that we recognized immediately that a pod that is referencing a PVC that doesn't exist is _never_ going to go ready, bail out early and allow Strimzi to proceed to create the PVC for the new pod.

Theoretically, this wouldn't even have to occur in a separate loop. You could do this just by having the waiter that is waiting for the new pod to go ready recognize that the PVC that is being referenced doesn't exist on the cluster.

### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement]: Reduce Latency to Create a new PVC when old PVC is removed during a manual rolling update #9732

Related problem

Suggested solution

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Enhancement]: Reduce Latency to Create a new PVC when old PVC is removed during a manual rolling update #9732

Description

Related problem

Suggested solution

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions