fix(appcontroller): application controller in core mode fails to sync when server.secretkey is missing#26793
fix(appcontroller): application controller in core mode fails to sync when server.secretkey is missing#26793anandf wants to merge 2 commits intoargoproj:masterfrom
Conversation
🔴 Preview Environment stopped on BunnyshellSee: Environment Details | Pipeline Logs Available commands (reply to this comment):
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #26793 +/- ##
==========================================
- Coverage 62.95% 62.94% -0.01%
==========================================
Files 414 414
Lines 56152 56146 -6
==========================================
- Hits 35349 35343 -6
+ Misses 17447 17446 -1
- Partials 3356 3357 +1 ☔ View full report in Codecov by Sentry. |
8ea2917 to
0811fc4
Compare
| if err := mgr.updateSettingsFromSecret(&settings, argoCDSecret, secrets); err != nil { | ||
| errs = append(errs, err) | ||
| } | ||
| updateSettingsFromConfigMap(&settings, argoCDCM) |
There was a problem hiding this comment.
I am concerned this change can make the contract of the function somewhat hard to understand - in case of certain error, the return value is partially invalid/empty. I mean, I see this fix is pulling the current design to its limits, but I would like to see if we can find a cleaner way around...
Can we treat the server.secretkey as optional in mgr.updateSettingsFromSecret so it will not error, since there is a situation when we know it is missing (the core deployment)? Or, eventually, ignore the key iff core mode is detected?
There was a problem hiding this comment.
Thanks @olivergondza for reviewing the PR. I agree with your view that the contract is bit ambiguous here. Generally when encountering an error, the practice is to return nil object and a non-nil error. But in this case, even if there is an incompleteSettingsErr, the settings returned has a valid object. So it looks like the contract is (AFAIK), you will get a settings object whatsoever, it may or may not be a complete one and the caller needs find it out by reading the error object and react accordingly.
Can we treat the server.secretkey as optional in mgr.updateSettingsFromSecret so it will not error, since there is a situation when we know it is missing (the core deployment)? Or, eventually, ignore the key iff core mode is detected?
As per my understanding, it is not possible to detect which mode the app controller is running in. Changing the behaviour by not returning the error may be beyond the scope of this fix and can impact other calling methods. I want to limit the fix to the Cluster Lookup methods which is the breaking change that I am trying to address. inClusterEnabled has a well defined default value true. So even if no setting is set explicitly, we can handle the cluster lookup methods.
There was a problem hiding this comment.
Just to be clear, I am open to fix the method contract for GetSettings() once we get complete clarity of the various scenarios that it is invoked. However, I prefer doing it as a separate PR, rather than doing it in the scope of this bug fix to avoid breaking anything especially when backporting to older versions.
There was a problem hiding this comment.
Good point about the incompleteSettingsErr, what I was concerned about was already happening. Ack, let's polish things later.
… when server.secretkey is missing Signed-off-by: anandf <anjoseph@redhat.com>
32801a3 to
46ff9dc
Compare
Signed-off-by: anandf <anjoseph@redhat.com>
| // isInClusterEnabled returns false if explicitly disabled by the user, true otherwise. | ||
| func isInClusterEnabled(settingsMgr *settings.SettingsManager) bool { | ||
| argoSettings, err := settingsMgr.GetSettings() |
There was a problem hiding this comment.
Hi, thanks for the PR!
This is slightly different from the previous contract discussion, but since this code path only needs InClusterEnabled while GetSettings() assembles much broader state, would it make sense to add a narrower helper on SettingsManager for just this flag so callers do not need to depend on the broader GetSettings() contract here?
Something along these lines, for example,
func (mgr *SettingsManager) IsInClusterEnabled() (bool, error) {
argoCDCM, err := mgr.getConfigMap()
if err != nil {
return true, fmt.Errorf("error retrieving argocd-cm: %w", err)
}
return argoCDCM.Data[inClusterEnabledKey] != "false", nil
}
Also, I notice some paths in this file now call isInClusterEnabled(...) multiple times within the same flow, so even if introducing a narrower helper feels out of scope here, it may still be worth reading the value once and reusing it locally where possible.
There was a problem hiding this comment.
Thanks @choejwoo for your review, I agree that a direct helper in SettingsManager will be better. Let me make that change.
Fixes #12903
Assisted-by: Claude code, Opus 4.6 model for generating unit tests, reviewed by author.
Description of the problem:
In 3.0 release, there was a new feature that provided an option to disable
in-clusterserver. This required all cluster lookup methods inutil/db/cluster.gonamelyGetCluster(),GetClusterServerByName(),ListClusters(),WatchClusters()to make a call tosettingsMgr.GetSettings()and to see if the flaginClusterEnabledis explicitly set tofalseby the user. In core mode, theargocd-secretdoes not have theserver.secretkeyas theargocd-servercomponent is never available in core mode. This causes theGetSettings()method to fail with anincompleteSettingsErrand all the cluster lookup methods inturn fail causing the application controller to not able to sync an application sucessfully.Description of the solution:
incompleteSettingsErrwhen callingsettingsMgr.GetSettings()method from these cluster lookup methods and instead use the default value ofinClusterEnabled=truesettingsMgr.GetSettings()method, so that the call toupdateSettingsFromConfigMap()is also executed before returning error to the caller.This fix needs to be backported to
release-3.0,release-3.1,release-3.2,release-3.3Steps to reproduce this issue
Create a kind cluster and install Argo CD in core mode
Check the application controller logs, you can find multiple warnings about missing
server.secretkeyDeploy a sample app
Application remains in
Unknownstate and multiple warnings in application controller pod logs about missing server.secretkeySteps to test the fix
Set the container image to the one having the fix
quay.io/anjoseph/argocd:masterGet the latest application status, it should get to
SyncedandHealthyCheck application controller pod logs and ensure no warnings about missing server.secretkey
Checklist: