ArgoCD Upgrade Failure (via Helm) #24504
Unanswered
VARUNGITGEEK
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hey Team,
I had performed an argocd upgrade on our production environment earlier this week. We were trying to upgrade from 2.8.4 to 2.9.6 (app version).. 5.46.6 to 5.54.0 (chart version ) . Upon performing a helm upgrade we found that the upgrade actually went through and argoCD was running on our target version (2.9.6). However, at this point all the data was gone and we were left with a blank UI ( no apps, no projects. Nothing ). We then performed a helm rollback to the previous version, argoCD did go back to running on 2.8.4 again however the data still did not come back.
With this leaving us no options we applied the backups. Following are the backups that were taken prior to performing an upgrade.
kubectl get applications.argoproj.io -n argocd -o yaml > applications-backup.yaml
kubectl get appprojects.argoproj.io -n argocd -o yaml > appprojects-backup.yaml
kubectl get cm argocd-cm -n argocd -o yaml > argocd-cm-backup.yaml
kubectl get cm argocd-rbac-cm -n argocd -o yaml > argocd-rbac-cm-backup.yaml
kubectl get cm argocd-tls-certs-cm -n argocd -o yaml > argocd-tls-certs-cm-backup.yaml
kubectl get cm argocd-cmd-params-cm -n argocd -o yaml > argocd-cmd-params-cm-backup.yaml
kubectl get cm argocd-ssh-known-hosts-cm -n argocd -o yaml > argocd-ssh-known-hosts-cm-backup.yaml
kubectl get crds -n argocd -o yaml > argocd-crds-backup.yaml
kubectl get secrets -n argocd -o yaml > secrets-backup.yaml
This brought back the data and provided some relief. However we were not sure as to what had actually happened and why? This activity was initially performed on dev environment and it went through without any issues. The upgrade path was different though (2.10 to 2.11 (app version). Upon performing some deep analysis we found couple of gotchas and would like and appreciate if you provided some guidance here.
1: The values.yml file of the production envirnoment had "installCRDs: false". Appears to me that the CRDs were skipped , while in dev environment it did not have this flag, thereby going by the default value “true”. Do we think blocking helm from not performing an upgrade to the CRDs which could be needed for the target version might have caused compatibility issues or any other issue leading to the blank UI issue ? And the fact the rollback did not bring back the data would mean the CRDs did not revert CRDs or repair Custom Resource (CR) objects. because by the time the rollback was executed the newer Argo CD pods had already interacted with CRs, potentially leaving them in an incompatible or partially corrupted state. ?? Hence reapplying the CRDs and other backup files was ultimately needed
2: The Egress networking policy was not applied as per documentation (https://argo-cd.readthedocs.io/en/stable/operator-manual/upgrading/2.8-2.9/). We did not choose to be restrictive and allow the outbounds. Would this have contributed to the the failure ? would this specific version have desired the policy to be restrictive ( or configured in the above documented way ) leading to the lack of connectivity to the Redis Cache thereby causing the issue?
Beta Was this translation helpful? Give feedback.
All reactions