Skip to content

fix(appcontroller): application controller in core mode fails to sync when server.secretkey is missing#26793

Open
anandf wants to merge 2 commits intoargoproj:masterfrom
anandf:fix_missing_server_key
Open

fix(appcontroller): application controller in core mode fails to sync when server.secretkey is missing#26793
anandf wants to merge 2 commits intoargoproj:masterfrom
anandf:fix_missing_server_key

Conversation

@anandf
Copy link
Member

@anandf anandf commented Mar 11, 2026

Fixes #12903

Assisted-by: Claude code, Opus 4.6 model for generating unit tests, reviewed by author.

Description of the problem:

In 3.0 release, there was a new feature that provided an option to disable in-cluster server. This required all cluster lookup methods in util/db/cluster.go namely GetCluster(), GetClusterServerByName(), ListClusters(), WatchClusters() to make a call to settingsMgr.GetSettings()and to see if the flag inClusterEnabled is explicitly set to falseby the user. In core mode, the argocd-secret does not have the server.secretkey as the argocd-server component is never available in core mode. This causes the GetSettings() method to fail with an incompleteSettingsErr and all the cluster lookup methods inturn fail causing the application controller to not able to sync an application sucessfully.

Description of the solution:

  • Ignore the incompleteSettingsErr when calling settingsMgr.GetSettings() method from these cluster lookup methods and instead use the default value of inClusterEnabled=true
  • Lazy return of error in the settingsMgr.GetSettings() method, so that the call to updateSettingsFromConfigMap() is also executed before returning error to the caller.

This fix needs to be backported to release-3.0, release-3.1,release-3.2,release-3.3

Steps to reproduce this issue

Create a kind cluster and install Argo CD in core mode

# kind create cluster --name argocd
# kubectl create ns argocd; kubectl apply -f manifests/core-install.yaml -n argocd --server-side

Check the application controller logs, you can find multiple warnings about missing server.secretkey

# kubectl logs argocd-application-controller-0
{"built":"2026-02-18T16:10:10Z","commit":"6a902023b2675253856306222c470f7a35e4bd02","level":"info","msg":"ArgoCD Application Controller is starting","namespace":"argocd","time":"2026-03-12T03:57:49Z","version":"v3.4.0+6a90202"}
...
...
{"level":"warning","msg":"error collecting cluster metrics: server.secretkey is missing","time":"2026-03-12T03:57:49Z"}
{"level":"warning","msg":"Failed to save clusters info: server.secretkey is missing","time":"2026-03-12T03:57:49Z"}
{"error":"server.secretkey is missing","level":"warning","msg":"Cannot init sharding. Error while querying clusters list from database","time":"2026-03-12T03:57:49Z"}
{"level":"warning","msg":"Failed to save clusters info: server.secretkey is missing","time":"2026-03-12T03:57:59Z"}

Deploy a sample app

# kubectl apply -f - <<EOF
kind: Application
apiVersion: argoproj.io/v1alpha1
metadata:
  name: guestbook
spec:
  destination:
    namespace: guestbook
    server: https://kubernetes.default.svc
  project: default
  source:
    directory:
      recurse: true
    path: guestbook
    repoURL: https://github.com/anandf/argocd-example-apps.git
  syncPolicy:
    automated: {}
    syncOptions:
    - CreateNamespace=true
    - ServerSideApply=true
EOF

Application remains in Unknownstate and multiple warnings in application controller pod logs about missing server.secretkey

# kubectl get application -A
NAME               SYNC STATUS   HEALTH STATUS
guestbook   Unknown       Unknown

kubectl logs argocd-application-controller-0 -n argocd | grep 'server.secretkey'

Steps to test the fix

Set the container image to the one having the fix quay.io/anjoseph/argocd:master

# kubectl set image statefulset/argocd-application-controller -n argocd '*=quay.io/anjoseph/argocd:master'
# kubectl set image deployments/argocd-applicationset-controller -n argocd '*=quay.io/anjoseph/argocd:master'
# kubectl set image deployments/argocd-repo-server -n argocd '*=quay.io/anjoseph/argocd:master'

Get the latest application status, it should get to Synced and Healthy

# oc get applications -w
NAME               SYNC STATUS   HEALTH STATUS
guestbook   OutOfSync        Missing
guestbook   Synced        Progressing
guestbook   Synced        Healthy

Check application controller pod logs and ensure no warnings about missing server.secretkey

# kubectl logs argocd-application-controller-0 -n argocd | grep 'server.secretkey'

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Title of the PR
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

@bunnyshell
Copy link

bunnyshell bot commented Mar 11, 2026

🔴 Preview Environment stopped on Bunnyshell

See: Environment Details | Pipeline Logs

Available commands (reply to this comment):

  • 🔵 /bns:start to start the environment
  • 🚀 /bns:deploy to redeploy the environment
  • /bns:delete to remove the environment

@codecov
Copy link

codecov bot commented Mar 11, 2026

Codecov Report

❌ Patch coverage is 88.88889% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.94%. Comparing base (3cb4955) to head (bb817fb).

Files with missing lines Patch % Lines
util/db/cluster.go 88.23% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #26793      +/-   ##
==========================================
- Coverage   62.95%   62.94%   -0.01%     
==========================================
  Files         414      414              
  Lines       56152    56146       -6     
==========================================
- Hits        35349    35343       -6     
+ Misses      17447    17446       -1     
- Partials     3356     3357       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@anandf anandf changed the title fix(appcontroller): application controller in core mode fails to sync in core mode fix(appcontroller): application controller in core mode fails to sync when server.secretkey is missing Mar 11, 2026
@anandf anandf force-pushed the fix_missing_server_key branch from 8ea2917 to 0811fc4 Compare March 11, 2026 17:34
@anandf anandf marked this pull request as ready for review March 11, 2026 17:34
@anandf anandf requested a review from a team as a code owner March 11, 2026 17:34
if err := mgr.updateSettingsFromSecret(&settings, argoCDSecret, secrets); err != nil {
errs = append(errs, err)
}
updateSettingsFromConfigMap(&settings, argoCDCM)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned this change can make the contract of the function somewhat hard to understand - in case of certain error, the return value is partially invalid/empty. I mean, I see this fix is pulling the current design to its limits, but I would like to see if we can find a cleaner way around...

Can we treat the server.secretkey as optional in mgr.updateSettingsFromSecret so it will not error, since there is a situation when we know it is missing (the core deployment)? Or, eventually, ignore the key iff core mode is detected?

Copy link
Member Author

@anandf anandf Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @olivergondza for reviewing the PR. I agree with your view that the contract is bit ambiguous here. Generally when encountering an error, the practice is to return nil object and a non-nil error. But in this case, even if there is an incompleteSettingsErr, the settings returned has a valid object. So it looks like the contract is (AFAIK), you will get a settings object whatsoever, it may or may not be a complete one and the caller needs find it out by reading the error object and react accordingly.

Can we treat the server.secretkey as optional in mgr.updateSettingsFromSecret so it will not error, since there is a situation when we know it is missing (the core deployment)? Or, eventually, ignore the key iff core mode is detected?

As per my understanding, it is not possible to detect which mode the app controller is running in. Changing the behaviour by not returning the error may be beyond the scope of this fix and can impact other calling methods. I want to limit the fix to the Cluster Lookup methods which is the breaking change that I am trying to address. inClusterEnabled has a well defined default value true. So even if no setting is set explicitly, we can handle the cluster lookup methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, I am open to fix the method contract for GetSettings() once we get complete clarity of the various scenarios that it is invoked. However, I prefer doing it as a separate PR, rather than doing it in the scope of this bug fix to avoid breaking anything especially when backporting to older versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point about the incompleteSettingsErr, what I was concerned about was already happening. Ack, let's polish things later.

… when server.secretkey is missing

Signed-off-by: anandf <anjoseph@redhat.com>
@anandf anandf force-pushed the fix_missing_server_key branch from 32801a3 to 46ff9dc Compare March 12, 2026 07:48
Signed-off-by: anandf <anjoseph@redhat.com>
Comment on lines +445 to +447
// isInClusterEnabled returns false if explicitly disabled by the user, true otherwise.
func isInClusterEnabled(settingsMgr *settings.SettingsManager) bool {
argoSettings, err := settingsMgr.GetSettings()
Copy link
Member

@choejwoo choejwoo Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the PR!

This is slightly different from the previous contract discussion, but since this code path only needs InClusterEnabled while GetSettings() assembles much broader state, would it make sense to add a narrower helper on SettingsManager for just this flag so callers do not need to depend on the broader GetSettings() contract here?

Something along these lines, for example,

func (mgr *SettingsManager) IsInClusterEnabled() (bool, error) {
	argoCDCM, err := mgr.getConfigMap()
	if err != nil {
		return true, fmt.Errorf("error retrieving argocd-cm: %w", err)
	}
	return argoCDCM.Data[inClusterEnabledKey] != "false", nil
}

Also, I notice some paths in this file now call isInClusterEnabled(...) multiple times within the same flow, so even if introducing a narrower helper feels out of scope here, it may still be worth reading the value once and reusing it locally where possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @choejwoo for your review, I agree that a direct helper in SettingsManager will be better. Let me make that change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

server.secretkey is missing when installing from core-install

3 participants