This is a backport issue for #4948, automatically created via GitHub Actions workflow initiated by @0xavi0
Original issue body:
When a GitRepo reconcile fails to create a gitjob (e.g. a referenced Helm secret doesn't exist, a quota is exceeded, or an admission webhook rejects the job), Fleet incorrectly persists the updated status to Kubernetes before the job is actually created.
On the next reconcile all trigger conditions evaluate to false, so no job is ever retried and the GitRepo appears healthy.
Three triggers are affected:
- A new commit arriving via webhook or polling (status.Commit is promoted prematurely)
- A spec.forceSyncGeneration bump (status.UpdateGeneration is consumed before the job runs)
- Any GitRepo spec change (status.ObservedGeneration is advanced before the job runs)
Why it happens:
In Reconcile, status.Commit is promoted in memory before manageGitJob is called. Inside manageGitJob, updateGenerationValuesIfNeeded advances UpdateGeneration and ObservedGeneration before validateExternalSecretExist and createJobAndResources — both of which can fail.
When either fails, updateErrorStatus persists all three already-mutated fields to Kubernetes. The next reconcile reads these values back, sees no difference between spec and status, and skips job creation entirely.
The error condition set during the failing reconcile is cleared seconds later when the follow-up reconcile completes cleanly, leaving the GitRepo indistinguishable from a healthy one with no errors and no indication that a deployment was missed.
This is a backport issue for #4948, automatically created via GitHub Actions workflow initiated by @0xavi0
Original issue body:
When a GitRepo reconcile fails to create a gitjob (e.g. a referenced Helm secret doesn't exist, a quota is exceeded, or an admission webhook rejects the job), Fleet incorrectly persists the updated status to Kubernetes before the job is actually created.
On the next reconcile all trigger conditions evaluate to false, so no job is ever retried and the GitRepo appears healthy.
Three triggers are affected:
Why it happens:
In Reconcile, status.Commit is promoted in memory before manageGitJob is called. Inside manageGitJob, updateGenerationValuesIfNeeded advances UpdateGeneration and ObservedGeneration before validateExternalSecretExist and createJobAndResources — both of which can fail.
When either fails, updateErrorStatus persists all three already-mutated fields to Kubernetes. The next reconcile reads these values back, sees no difference between spec and status, and skips job creation entirely.
The error condition set during the failing reconcile is cleared seconds later when the follow-up reconcile completes cleanly, leaving the GitRepo indistinguishable from a healthy one with no errors and no indication that a deployment was missed.