[BUG] v2 instance-manager pod stuck in create/delete loop when engine frontend recovery blocks gRPC startup

### Describe the Bug

When an instance-manager pod (v2 data engine) restarts while persisted engine frontend records reference NVMe-TCP targets that are no longer reachable (e.g., deleted volume, migrated engine), the `recoverEngineFrontends()` function blocks `NewServer()` synchronously. This prevents gRPC servers from starting, causing the liveness probe to fail and kubelet to kill the container. Since the pod has `restartPolicy: Never`, the Longhorn controller deletes and recreates the pod, but the same stale metadata still exists, creating an infinite crash loop.


### To Reproduce

1. Use Longhorn v1.12.x with v2 (SPDK) data engine enabled
2. Create ten v2 volumes and attach them, then delete it while the instance-manager pod is being recycled (or force-kill the IM pod while the volume is being deleted)
3. The instance-manager pod enters a create/delete loop.

### Expected Behavior

The instance-manager pod should start serving gRPC immediately regardless of engine frontend recovery status. Recovery of stale/unreachable targets should fail fast and clean up without affecting pod health.

### Support Bundle for Troubleshooting

N/A

### Environment

- Longhorn version: 
- Impacted volume (PV): 
- Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
- Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
  - Number of control plane nodes in the cluster:
  - Number of worker nodes in the cluster:
- Node config
  - OS type and version:
  - Kernel version:
  - CPU per node:
  - Memory per node:
  - Disk type (e.g. SSD/NVMe/HDD):
  - Network bandwidth between the nodes (Gbps):
- Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
- Number of Longhorn volumes in the cluster:


### Additional context

_No response_

### Workaround and Mitigation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] v2 instance-manager pod stuck in create/delete loop when engine frontend recovery blocks gRPC startup #13185

Describe the Bug

To Reproduce

Expected Behavior

Support Bundle for Troubleshooting

Environment

Additional context

Workaround and Mitigation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] v2 instance-manager pod stuck in create/delete loop when engine frontend recovery blocks gRPC startup #13185

Description

Describe the Bug

To Reproduce

Expected Behavior

Support Bundle for Troubleshooting

Environment

Additional context

Workaround and Mitigation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions