Skip to content

Fleet rollout gets stuck #4283

@thardeck

Description

@thardeck

Copy of the original issue #4144 by @manno, since it was kind of taken over by a similar but unrelated topic. To keep the context the issues are split.

I am using Fleet v0.13.2-rc.2 to deploy to 500 clusters. The rollout gets stuck only 164/500 bundledeployments are created.
The deployment consists of 25k bundledeployments. The reconcile workers are configured to:

  --set controller.reconciler.workers.gitrepo=100 \
  --set controller.reconciler.workers.bundle=200 \
  --set controller.reconciler.workers.bundledeployment=200 \
  --set controller.reconciler.workers.cluster=100 \

Status of the stuck bundle:

   conditions:                                                                                                                                                              
   - lastUpdateTime: "2025-09-17T12:18:45Z"                                                                                                                                  
     message: 'targeting error: failed to get options secret for bundledeployment cluster-fleet-default-d0-k3k-downstream001-downstream0050-fa83d/scale-50-single-scale-50-bundles-twenty-sixteen,                                                                                                                                                              
       this is likely temporary: secrets "scale-50-single-scale-50-bundles-twenty-sixteen"                                                                      
       not found'                                                                                                                                                  
     reason: Error                                                                                                                                                           
     status: "False"                                                                                                                                                            
     type: Ready                                                                                                                                                                
   display:                                                                                                                                                                  
     readyClusters: 164/500                                                                                                                                                     
     state: WaitApplied

When I check, the secret is missing.

  • I waited 30 minutes.
  • Restarting the fleet-controller does not create the missing secret.
  • Upgrading to v0.14.0-alpha.2 does not create the secret.
  • Increasing the forceSyncGeneration re-creates all bundledeployments. Possibly getting stuck again.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    Status

    ✅ Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions