Skip to content

Conversation

@olivergondza
Copy link
Contributor

@olivergondza olivergondza commented Sep 18, 2025

What type of PR is this?

/kind enhancement

What does this PR do / why we need it:

Permit users to trust CAs on a repo-server system level

Have you updated the necessary documentation?

  • Documentation update is required by this PR.
  • Documentation has been updated.
Screenshot From 2025-09-23 13-17-05

Which issue(s) this PR fixes:

Fixes #1830

How to test changes / Special notes to the reviewer:

Can be tested against:

k3d cluster create localtest \
              --k3s-arg "--kube-apiserver-arg=feature-gates=ClusterTrustBundle=true,ClusterTrustBundleProjection=true@server:*" \
              --k3s-arg "--kube-apiserver-arg=runtime-config=certificates.k8s.io/v1beta1/clustertrustbundles=true@server:*" \
              --k3s-arg "--kubelet-arg=feature-gates=ClusterTrustBundle=true,ClusterTrustBundleProjection=true@agent:*" \
              --image rancher/k3s:v1.33.0-k3s1

Summary by CodeRabbit

  • New Features

    • System CA trust injection for the repo server and its plugins via ClusterTrustBundles, Secrets, and ConfigMaps; mounts, init-container CA installation, and targeted repo-server rollouts on trust changes.
    • CRD/schema extended with a new systemCATrust section to declare trust sources and drop-image-certificates behavior.
    • RBAC updated to allow reading ClusterTrustBundles.
  • Documentation

    • Added reference docs and YAML examples for system CA trust configuration and semantics.
  • Tests

    • New end-to-end tests and helper matchers to validate trust projection and dynamic updates.
  • Chores

    • CI matrix expanded to cover additional k3s versions and enable trust-related flags.

✏️ Tip: You can customize this high-level summary in your review settings.

@olivergondza olivergondza changed the title feat(repo-server): Declare custom trust anchors to use by repo-server or plugins feat(repo-server): Declare custom trust anchors to used by repo-server or plugins Sep 18, 2025
@olivergondza
Copy link
Contributor Author

Compared to the proposal in #1876, it turned out 1 init container is enough. Also, this implements DropImageAnchors to suppress whatever CAs was in the image originally.

@olivergondza
Copy link
Contributor Author

/ok-to-test

@olivergondza olivergondza changed the title feat(repo-server): Declare custom trust anchors to used by repo-server or plugins feat(repo-server): Declare custom trust certs for repo-server and plugins Sep 22, 2025
@olivergondza
Copy link
Contributor Author

The "Code scans / Run golangci-lint and gosec (pull_request)" failure to be adressed by #1880

@olivergondza
Copy link
Contributor Author

The test failures are related to the fact the code depends on a tech-preview features. Any advise on how to handle such functionality?

https://github.com/argoproj-labs/argocd-operator/actions/runs/17917840047/job/50944661583?pr=1876#step:11:45

@olivergondza olivergondza force-pushed the ClusterTrustBundle branch 6 times, most recently from 93a8f75 to 2e5afb6 Compare September 24, 2025 13:58
@olivergondza olivergondza force-pushed the ClusterTrustBundle branch 2 times, most recently from de515ac to 448e641 Compare September 30, 2025 13:09
@olivergondza olivergondza marked this pull request as draft September 30, 2025 14:43
@olivergondza olivergondza force-pushed the ClusterTrustBundle branch 6 times, most recently from c0a9ad2 to 448e641 Compare October 4, 2025 07:01
@olivergondza olivergondza marked this pull request as ready for review October 10, 2025 12:52
@olivergondza olivergondza force-pushed the ClusterTrustBundle branch 3 times, most recently from 198d41d to 9c3639c Compare October 27, 2025 11:12
@olivergondza
Copy link
Contributor Author

@jannfis, after your review, this is the added feature of change detection: 9c3639c

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go (2)

118-399: Excellent comprehensive test coverage of CA trust sources and reconciliation.

These test cases thoroughly exercise:

  • Trust establishment from ClusterTrustBundles, ConfigMaps, and Secrets
  • Plugin sidecar trust propagation
  • DropImageCertificates behavior
  • Dynamic reconciliation on resource create/update/delete
  • Optional vs required resource handling

The defensive delete-before-create pattern (lines 129, 357) with explanatory comments is good practice for handling test retry scenarios. The mix of positive and negative tests provides strong validation.

Optional: Review 6-minute reconciliation timeout

The CTB reconciliation tests use 6m timeouts (lines 391, 398) with 15s polling intervals. If cluster-wide CTB propagation is typically faster in your test environment, consider reducing these to improve test execution time (e.g., 3m). If the long timeout is necessary for CI stability, this is fine as-is.


860-899: Comprehensive trust verification and cleanup.

The verifyCorrectlyConfiguredTrust function provides excellent coverage by testing both Helm chart fetches and plugin git clones against trusted and untrusted hosts. The selection of different certificate authorities (GitHub CA vs. Google CA, line 865 comment) demonstrates careful test design.

The purgeCtbs cleanup function properly handles cluster-wide resource cleanup using label selectors, preventing test pollution.

Optional: External service dependency consideration

The tests depend on external public services (helm.nginx.com, stefanprodan.github.io, github.com, kernel.googlesource.com) maintaining their current certificate configurations. While this provides realistic E2E coverage, certificate changes or service unavailability could cause test flakiness. Consider:

  • Documenting this dependency in test comments
  • Having a fallback plan if tests become unstable due to external cert changes

This is an acceptable trade-off for E2E test realism, but worth noting.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 253e8d5 and 6204ade.

📒 Files selected for processing (1)
  • tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-06T10:50:06.487Z
Learnt from: olivergondza
Repo: argoproj-labs/argocd-operator PR: 1876
File: tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go:59-62
Timestamp: 2026-01-06T10:50:06.487Z
Learning: In the file tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go (argocd-operator), the CA bundle path should be set to /etc/ssl/certs/ca-certificates.crt on Ubuntu. Downstream, for RHEL/Fedora/UBI-specific logic, see the gitops-operator fork; these OS-specific differences should not be mixed in this file.

Applied to files:

  • tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build
  • GitHub Check: build-operator
  • GitHub Check: Run end-to-end tests (v1.27.1-k3s1)
  • GitHub Check: Run end-to-end tests (v1.33.5-k3s1)
  • GitHub Check: Run golangci-lint and gosec
🔇 Additional comments (7)
tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go (7)

17-81: Test file structure and setup look good.

The imports are well-organized, and the global variables appropriately scope test configuration. The Ubuntu-specific CA bundle path is correct for upstream argocd-operator per the established pattern. The use of public Helm registries for trusted/untrusted test sources provides good coverage.


83-98: Clean test lifecycle management.

The BeforeEach/AfterEach hooks properly ensure test isolation with cleanup and cluster capability detection. The CTB support detection allows tests to gracefully skip when the feature isn't available.


99-116: Solid negative test for missing dependencies.

This test correctly verifies that repo-server startup is blocked when referenced Secrets are missing, while the server component continues running. The use of Eventually and Consistently provides proper async verification without brittle sleep calls.


401-476: Smart reconciliation selectivity test.

This test properly validates that the controller only reconciles when relevant ClusterTrustBundles change, avoiding unnecessary pod restarts. The use of random label values (line 410) prevents test collision, and the negative assertions with Consistently and expectReconcile(reconcile: false) confirm stable behavior when unrelated resources change.


480-663: Well-designed helper functions.

The helper functions provide clean abstractions for test setup:

  • ctbUpdate properly handles update conflicts with retry logic
  • detectClusterTrustBundleSupport gracefully detects cluster capability
  • createGitPullingPlugin constructs a realistic plugin that exercises TLS trust through git clone subprocess calls
  • Resource creators are concise and focused

The ClusterTrustBundle APIVersion is correctly set to v1beta1 (line 651), matching the import at line 44.


690-709: CA certificate fetching is sound.

The approach of fetching real CA certificates from public TLS endpoints provides realistic test data. Extracting the root CA (last in chain) is correct for ClusterTrustBundle requirements per the inline comment.

Note on TLS 1.3 minimum: The MinVersion: tls.VersionTLS13 configuration (line 691) ensures modern crypto but could fail if any test hosts (github.com, github.io, helm.nginx.com, etc.) don't support TLS 1.3. If E2E tests fail with handshake errors, consider falling back to tls.VersionTLS12. However, if all current test hosts support TLS 1.3, this is fine.


711-858: Excellent test helpers and custom matchers.

The podTrust abstraction and trustCerts custom matcher demonstrate thoughtful test design:

  • trustCerts combines count matching with log inspection and attaches diagnostics on failure, significantly improving debuggability
  • getTrustedCertCount correctly parses PEM certificates using strings.Lines (valid in Go 1.24+, per past discussion)
  • expectReconcile elegantly verifies pod restart behavior for both positive and negative reconciliation cases
  • findRunningRepoServerPod uses Eventually for robust pod discovery

The loop variable comment at line 845 is good defensive practice, though Go 1.22+ loop scoping makes it unnecessary.

@olivergondza
Copy link
Contributor Author

@jgwest, do you see anything else needed to proceed with this?

… or plugins

Signed-off-by: Oliver Gondža <ogondza@gmail.com>
…plugins from Secrets and CMs

Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
… CTB support

Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
…rustBundles to repo-server automatically

Signed-off-by: Oliver Gondža <ogondza@gmail.com>
…rest

- Use default image pull policy
- Use default security context
- Propagate proxy env vars

Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
controllers/argocd/util.go (1)

1033-1116: Add RBAC rule for ClusterTrustBundle watches.

Line 1114 watches certificates.ClusterTrustBundle{} resources, but the controller RBAC (config/rbac/role.yaml and kubebuilder markers) lacks permissions for the certificates.k8s.io/clustertrustbundles resource. Add a kubebuilder:rbac marker granting get, list, and watch verbs on certificates.k8s.io/clustertrustbundles to prevent runtime watch failures on clusters with ClusterTrustBundle API support.

🤖 Fix all issues with AI agents
In `@controllers/argocd/repo_server.go`:
- Around line 597-617: The projected CA-trust volume creation in repo_server.go
(variables volumeSource/volumeTarget and slice repoServerVolumes) can produce a
ProjectedVolumeSource with zero Sources when r.caTrustVolumes(cr) returns an
empty sources list; update the logic in the code that constructs
repoServerVolumes (around the caTrustVolumes call) to detect when sources is
empty (len(sources) == 0) and either (a) do not append the projected volume and
the corresponding emptyDir target to repoServerVolumes, or (b) surface/return a
clear validation error from the function (or caller) indicating
spec.repo.systemCATrust has no sources; ensure you change only the
construction/append logic around repoServerVolumes and reference
r.caTrustVolumes, volumeSource, and volumeTarget so the pod never receives a
ProjectedVolumeSource with zero sources.

In `@tests/ginkgo/fixture/pod/fixture.go`:
- Around line 17-30: The GetPodByNameRegexp function dereferences nameRegexp and
calls nameRegexp.String() without checking for nil; add an explicit nil guard at
the start (e.g., Expect(nameRegexp).ToNot(BeNil())) and change the failure
message in the final Expect to avoid calling nameRegexp.String() when nil (use a
safe representation or include the regexp only after the nil check) so the test
fails with a clear assertion rather than panicking.
🧹 Nitpick comments (2)
tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go (1)

720-730: Replace panic with Ginkgo's Fail for idiomatic test failure handling.

Using panic(err) bypasses Ginkgo's error handling and reporting. Per Ginkgo best practices, use Fail() or Expect() for test failures.

🔧 Suggested fix
 func (pt *podTrust) fetch() {
 	pod := findRunningRepoServerPod(pt.k8sClient, pt.ns)
 	pt.count = getTrustedCertCount(pod)
 	pt.log = getRepoCertGenerationLog(pod)
 
 	out, err := osFixture.ExecCommandWithOutputParam(false, false, "kubectl", "-n", pt.ns.Name, "events")
-	if err != nil {
-		panic(err)
-	}
+	Expect(err).ToNot(HaveOccurred(), "failed to get events: %s", out)
 	pt.events = out
 }
docs/reference/argocd.md (1)

1138-1143: Nested list indentation style (optional).

The static analysis hints flag 4-space indentation on the nested list items. While both 2-space and 4-space indentation render correctly in markdown, you may want to align with the project's markdown style guide if one exists.

🔧 Optional fix for 2-space indentation
 - The certificates are properly configured inside the container, permitting more sophisticated (yet secure) plugin logic. For example:
-    - Kustomize can invoke Helm reaching to other hosts than the source repo
-    - Kustomize can pull resources from other repositories/sources over HTTPS
-    - In general, Config Management Plugin can invoke any TLS-enabled tool present in the image with TLS verification on.
+  - Kustomize can invoke Helm reaching to other hosts than the source repo
+  - Kustomize can pull resources from other repositories/sources over HTTPS
+  - In general, Config Management Plugin can invoke any TLS-enabled tool present in the image with TLS verification on.

Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go`:
- Around line 711-730: In podTrust.fetch(), replace the kubectl invocation that
currently uses the "events" subcommand with the broader-compatible form
["kubectl", "-n", pt.ns.Name, "get", "events"] by updating the
osFixture.ExecCommandWithOutputParam call in podTrust.fetch() so it runs
"kubectl get events" (matching the pattern used elsewhere, e.g. fixture.go) to
maintain compatibility with older kubectl versions.
- Around line 507-514: The detectClusterTrustBundleSupport function currently
only treats a discovery failure as absence of CTB support; update it to also
treat meta.IsNoMatchError and apierrors.IsNotFound as unsupported so the test
will skip instead of fail: when calling k8sClient.List(ctx,
&certificatesv1beta1.ClusterTrustBundleList{}), check if the returned err is an
apiutil.ErrResourceDiscoveryFailed, meta.IsNoMatchError(err), or
apierrors.IsNotFound(err) and return false for any of those cases, otherwise
Expect(err).ToNot(HaveOccurred()) and return true.
- Around line 690-701: The getCACert function uses tls.Dial without a timeout
and forces TLS 1.3; replace that by creating a net.Dialer with a short timeout
(e.g. 5s) and call tls.DialWithDialer, and change the tls.Config MinVersion from
tls.VersionTLS13 to tls.VersionTLS12 so restricted environments still work; also
add the missing imports "net" and "time". Ensure you still close the connection
and return encodeCert(pcs[len(pcs)-1]) as before.

Comment on lines +507 to +514
func detectClusterTrustBundleSupport(k8sClient client.Client, ctx context.Context) bool {
err := k8sClient.List(ctx, &certificatesv1beta1.ClusterTrustBundleList{})
if _, ok := err.(*apiutil.ErrResourceDiscoveryFailed); ok {
return false
}
Expect(err).ToNot(HaveOccurred()) // Every other error is an error
return true
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the file structure and current imports
head -50 tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go | cat -n

Repository: argoproj-labs/argocd-operator

Length of output: 1960


🏁 Script executed:

# Check the imports in the file
grep -A 30 "^import" tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go | head -40

Repository: argoproj-labs/argocd-operator

Length of output: 1041


🏁 Script executed:

# Look at the exact context around the function (lines 500-520)
sed -n '500,520p' tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go | cat -n

Repository: argoproj-labs/argocd-operator

Length of output: 875


🏁 Script executed:

# Search for other uses of meta.IsNoMatchError or apierrors.IsNotFound in the codebase to see if this pattern is used elsewhere
rg "meta\.IsNoMatchError|apierrors\.IsNotFound" --type go

Repository: argoproj-labs/argocd-operator

Length of output: 4411


🏁 Script executed:

# Search for ErrResourceDiscoveryFailed usage in the codebase
rg "ErrResourceDiscoveryFailed" --type go

Repository: argoproj-labs/argocd-operator

Length of output: 194


🏁 Script executed:

# Search for meta package usage patterns in the codebase
rg "meta\." --type go | head -30

Repository: argoproj-labs/argocd-operator

Length of output: 1438


🏁 Script executed:

# Check if there's any documentation or comments about CTB support detection
rg -B 5 -A 5 "detectClusterTrustBundleSupport" tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go

Repository: argoproj-labs/argocd-operator

Length of output: 681


🏁 Script executed:

# Search for usages of this function to understand the context better
rg "detectClusterTrustBundleSupport" --type go

Repository: argoproj-labs/argocd-operator

Length of output: 372


🌐 Web query:

Kubernetes client-go meta.IsNoMatchError vs apierrors.IsNotFound error handling

💡 Result:

Short answer

  • meta.IsNoMatchError(err): the API discovery says "no matches for resource/kind/version" — the server doesn't know that resource (e.g., CRD not installed or using wrong apiVersion). Use when your code attempted to map a GVK/GVR and discovery failed. [1]
  • apierrors.IsNotFound(err): the API server returned a 404 for a specific object (resource kind exists, but the named object/namespace was not found). Use when a Get/Delete/Update targeted a specific object that isn't present. [2]

When to check which

  • If you called discovery, REST mapping, or used a dynamic client and mapping failed => check meta.IsNoMatchError.
  • If you called clientset.CoreV1().Pods(...).Get(...) or server returned a Status with NotFound => check apierrors.IsNotFound.

Minimal Go examples

  • NoMatch (discovery/mapping):
    if meta.IsNoMatchError(err) {
    // resource kind/version not available (CRD missing, wrong apiVersion)
    }

  • NotFound (object-level 404):
    if apierrors.IsNotFound(err) {
    // object doesn't exist
    }

Recommendation

  • Prefer handling IsNoMatchError when you depend on a resource type (e.g., optional CRD) so you can fall back or skip; handle IsNotFound for expected-missing object semantics (create-if-missing or ignore). [1][2]

Sources

  • meta.IsNoMatchError docs. [1]
  • k8s api/errors IsNotFound docs. [2]

🌐 Web query:

Kubernetes ClusterTrustBundle disabled runtime-config error type client List

💡 Result:

You're seeing that error because the ClusterTrustBundle API is off by default (alpha/beta) and the API group/version is disabled in the API server's runtime-config. Fix: enable the API (or the feature gate) on the kube-apiserver so the /apis/certificates.k8s.io/... endpoints exist.

Quick fixes:

  • If your cluster runs kube-apiserver directly, start it with the runtime-config flag to enable the API group/version:
    --runtime-config=certificates.k8s.io/v1alpha1=true
  • If you use kind, add runtimeConfig in the cluster config (maps to --runtime-config). Example:
    runtimeConfig:
    "certificates.k8s.io/v1alpha1": "true"
  • Also ensure the ClusterTrustBundle feature gate is enabled if your K8s release requires it (ClusterTrustBundle is alpha/beta and may need the feature gate).
  • After enabling, retry: kubectl get clustertrustbundles (or the plural in the API version you're enabling).

References:

  • ClusterTrustBundle docs (feature state, API group requirements). [1]
  • kind runtimeConfig / kube-apiserver --runtime-config mapping. [2]

[1] https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/
[2] https://kind.sigs.k8s.io/docs/user/configuration/


Handle unsupported CTB APIs beyond discovery failures.

When ClusterTrustBundles are disabled (e.g., runtime-config off), client.List returns NoMatch or NotFound errors instead of ErrResourceDiscoveryFailed, causing the test suite to fail instead of skip. Add handling for meta.IsNoMatchError() and apierrors.IsNotFound() alongside the existing discovery failure check.

🛠️ Suggested fix
@@
 import (
 	"context"
 	"crypto/tls"
 	"crypto/x509"
 	"encoding/pem"
 	"fmt"
 	"regexp"
 	"strings"
 
+	apierrors "k8s.io/apimachinery/pkg/api/errors"
+	"k8s.io/apimachinery/pkg/api/meta"
 	"github.com/onsi/gomega/gcustom"
 	matcher "github.com/onsi/gomega/types"
 	"k8s.io/apimachinery/pkg/util/rand"
@@
 func detectClusterTrustBundleSupport(k8sClient client.Client, ctx context.Context) bool {
 	err := k8sClient.List(ctx, &certificatesv1beta1.ClusterTrustBundleList{})
-	if _, ok := err.(*apiutil.ErrResourceDiscoveryFailed); ok {
-		return false
-	}
-	Expect(err).ToNot(HaveOccurred()) // Every other error is an error
-	return true
+	if err == nil {
+		return true
+	}
+	if _, ok := err.(*apiutil.ErrResourceDiscoveryFailed); ok ||
+		meta.IsNoMatchError(err) || apierrors.IsNotFound(err) {
+		return false
+	}
+	Expect(err).ToNot(HaveOccurred()) // Every other error is an error
+	return true
 }
🤖 Prompt for AI Agents
In `@tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go` around lines
507 - 514, The detectClusterTrustBundleSupport function currently only treats a
discovery failure as absence of CTB support; update it to also treat
meta.IsNoMatchError and apierrors.IsNotFound as unsupported so the test will
skip instead of fail: when calling k8sClient.List(ctx,
&certificatesv1beta1.ClusterTrustBundleList{}), check if the returned err is an
apiutil.ErrResourceDiscoveryFailed, meta.IsNoMatchError(err), or
apierrors.IsNotFound(err) and return false for any of those cases, otherwise
Expect(err).ToNot(HaveOccurred()) and return true.

Comment on lines +690 to +701
func getCACert(host string) string {
config := &tls.Config{MinVersion: tls.VersionTLS13}
conn, err := tls.Dial("tcp", host+":443", config)
Expect(err).ToNot(HaveOccurred())
defer func() { _ = conn.Close() }()

pcs := conn.ConnectionState().PeerCertificates

// ClusterTrustBundle cannot hold leaf certificates, so testing with CA cert at least. In theory, some of the hosts
// we test against can share the same CA cert, so albeit not likely, rudimentary negative testing is needed.
return encodeCert(pcs[len(pcs)-1])
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go | head -50

Repository: argoproj-labs/argocd-operator

Length of output: 202


🏁 Script executed:

wc -l tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go

Repository: argoproj-labs/argocd-operator

Length of output: 137


🏁 Script executed:

head -50 tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go

Repository: argoproj-labs/argocd-operator

Length of output: 1610


🏁 Script executed:

sed -n '680,710p' tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go

Repository: argoproj-labs/argocd-operator

Length of output: 964


🏁 Script executed:

grep -n "tls.Dial\|net.Dialer" tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go

Repository: argoproj-labs/argocd-operator

Length of output: 127


🏁 Script executed:

grep -n "VersionTLS" tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go

Repository: argoproj-labs/argocd-operator

Length of output: 129


🌐 Web query:

Go tls.Dial timeout net.Dialer best practice

💡 Result:

Short answer — prefer explicit DialContext + tls.Client (or net.Dialer with a context/deadline) and set timeouts for both the TCP connect and the TLS handshake. Example pattern and rationale:

Why

  • tls.Dial / DialWithDialer will apply the dialer Timeout/deadline to the entire connect+handshake, but using DialContext gives you explicit cancellation and finer control. (sources.debian.org)
  • Use a context (or net.Dialer.Timeout/Deadline) to cancel connect attempts; use a separate deadline around the TLS handshake (or Transport.TLSHandshakeTimeout for http) so a slow handshake can be aborted. (stackoverflow.com)
  • Be aware that DialContext may try multiple resolved IPs and timeouts can interact per-address; test in your environment. (forum.golangbridge.org)

Recommended minimal pattern (conceptual)

  • Create a net.Dialer. Use DialContext with a context.WithTimeout for the TCP connect.
  • After getting the net.Conn, set a short per-operation deadline for the handshake (conn.SetDeadline or use context with tls.Handshake) and call tls.Client(...).Handshake(). Clear deadlines after handshake.

Example (sketch)

  • ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second); defer cancel()
  • d := &net.Dialer{Timeout: 0, KeepAlive: 30*time.Second}
  • rawConn, err := d.DialContext(ctx, "tcp", "host:443")
  • rawConn.SetDeadline(time.Now().Add(5*time.Second)) // handshake deadline
  • tlsConn := tls.Client(rawConn, &tls.Config{ServerName: "host"})
  • if err := tlsConn.Handshake(); err != nil { rawConn.Close(); return err }
  • rawConn.SetDeadline(time.Time{}) // clear deadlines

Notes / best-practices checklist

  • Prefer DialContext for cancellability. (stackoverflow.com)
  • Use separate timeouts/deadlines for connect vs handshake (or http.Transport.TLSHandshakeTimeout). (hustcat.github.io)
  • Clear deadlines after handshake so I/O isn’t unintentionally limited.
  • Test behavior with multiple A records and in your network (per-IP dialing behavior). (forum.golangbridge.org)

References
[1] crypto/tls: DialWithDialer docs (connect+handshake semantics). (sources.debian.org)
[2] net.Dialer / DialContext usage and examples (timeouts & cancellation). (stackoverflow.com)
[3] TLS handshake timeout guidance (http.Transport.TLSHandshakeTimeout and separate handshake timeouts). (hustcat.github.io)

If you want, I can produce a ready-to-copy Go function implementing this pattern with proper deadline handling and comments.

Citations:


Add a network timeout and lower TLS floor for cert fetches.

tls.Dial without a timeout can hang indefinitely, and requiring TLS 1.3 can cause failures in restricted environments. Use net.Dialer with a timeout and lower the minimum to TLS 1.2.

Imports needed: "net" and "time" (currently missing).

Suggested fix
import (
 	"context"
 	"crypto/tls"
 	"crypto/x509"
 	"encoding/pem"
 	"fmt"
+	"net"
 	"regexp"
 	"strings"
+	"time"
func getCACert(host string) string {
-	config := &tls.Config{MinVersion: tls.VersionTLS13}
-	conn, err := tls.Dial("tcp", host+":443", config)
+	dialer := &net.Dialer{Timeout: 10 * time.Second}
+	config := &tls.Config{MinVersion: tls.VersionTLS12}
+	conn, err := tls.DialWithDialer(dialer, "tcp", host+":443", config)
 	Expect(err).ToNot(HaveOccurred())
 	defer func() { _ = conn.Close() }()
🤖 Prompt for AI Agents
In `@tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go` around lines
690 - 701, The getCACert function uses tls.Dial without a timeout and forces TLS
1.3; replace that by creating a net.Dialer with a short timeout (e.g. 5s) and
call tls.DialWithDialer, and change the tls.Config MinVersion from
tls.VersionTLS13 to tls.VersionTLS12 so restricted environments still work; also
add the missing imports "net" and "time". Ensure you still close the connection
and return encodeCert(pcs[len(pcs)-1]) as before.

Comment on lines +711 to +730
type podTrust struct {
ns *corev1.Namespace
k8sClient client.Client

count int
log string
events string
}

func (pt *podTrust) fetch() {
pod := findRunningRepoServerPod(pt.k8sClient, pt.ns)
pt.count = getTrustedCertCount(pod)
pt.log = getRepoCertGenerationLog(pod)

out, err := osFixture.ExecCommandWithOutputParam(false, false, "kubectl", "-n", pt.ns.Name, "events")
if err != nil {
Expect(err).NotTo(HaveOccurred())
}
pt.events = out
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's verify the file exists and confirm the code snippet
fd "1-120_repo_server_system_ca_trust.go" 2>/dev/null | head -5

Repository: argoproj-labs/argocd-operator

Length of output: 133


🌐 Web query:

kubectl events subcommand kubectl version history compatibility

💡 Result:

Short summary — history and compatibility

  • History: events were long exposed via the Events API (kubectl get events). A community "kubectl-events" plugin (krew/standalone) existed for richer views. SIG-CLI moved the feature into kubectl proper (KEP/PRs) and promoted it to a top-level command (kubectl events); it reached beta/ship as a top-level command around the 1.26–1.28 timeframe. [1][2][3]

  • Current docs/behaviour: kubectl events is now a built-in kubectl subcommand with --for, --watch, --types, output formats, etc.; see the kubectl events reference. [2]

  • Compatibility: follow the Kubernetes Version Skew Policy — kubectl is supported within one minor version older or newer than the kube-apiserver it talks to (i.e., kubectl may be ±1 minor relative to the API server). If your cluster has mixed apiserver versions (HA skew), that narrows supported kubectl versions. For safe operation, use a kubectl release within one minor of your kube-apiserver. [4]

Sources

  1. SIG-CLI / release discussion & KEP/exception for moving kubectl events to top-level. [kubernetes-sig-cli Google Group]. [1]
  2. kubectl events reference (official docs). [2]
  3. History / community plugin info (kubectl-events via krew). [3]
  4. Kubernetes Version Skew Policy — kubectl ±1 minor compatibility. [4]

If you want, I can list exact Kubernetes release numbers/dates for when events moved between alpha/beta/stable.


🏁 Script executed:

# Check kubectl documentation and search for any kubectl usage patterns in the codebase
rg "kubectl.*events" --type go

Repository: argoproj-labs/argocd-operator

Length of output: 487


Prefer kubectl get events for broader kubectl compatibility.

The kubectl events subcommand was introduced in Kubernetes 1.26+. For compatibility with older kubectl versions, use kubectl get events, which leverages the traditional Events API and works across all versions. The codebase already uses this pattern in fixture.go.

🛠️ Suggested fix
-	out, err := osFixture.ExecCommandWithOutputParam(false, false, "kubectl", "-n", pt.ns.Name, "events")
+	out, err := osFixture.ExecCommandWithOutputParam(false, false, "kubectl", "-n", pt.ns.Name, "get", "events")
🤖 Prompt for AI Agents
In `@tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go` around lines
711 - 730, In podTrust.fetch(), replace the kubectl invocation that currently
uses the "events" subcommand with the broader-compatible form ["kubectl", "-n",
pt.ns.Name, "get", "events"] by updating the
osFixture.ExecCommandWithOutputParam call in podTrust.fetch() so it runs
"kubectl get events" (matching the pattern used elsewhere, e.g. fixture.go) to
maintain compatibility with older kubectl versions.

@AdamSaleh
Copy link
Contributor

As far as tests go, it looks throroug! LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: Populate repo-server container with ClusterTrustBundle CA certificates

5 participants