-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(repo-server): Declare custom trust certs for repo-server and plugins #1876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat(repo-server): Declare custom trust certs for repo-server and plugins #1876
Conversation
|
Compared to the proposal in #1876, it turned out 1 init container is enough. Also, this implements |
|
/ok-to-test |
9e0a19d to
4653fab
Compare
|
The "Code scans / Run golangci-lint and gosec (pull_request)" failure to be adressed by #1880 |
|
The test failures are related to the fact the code depends on a tech-preview features. Any advise on how to handle such functionality? |
93a8f75 to
2e5afb6
Compare
de515ac to
448e641
Compare
c0a9ad2 to
448e641
Compare
198d41d to
9c3639c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go (2)
118-399: Excellent comprehensive test coverage of CA trust sources and reconciliation.These test cases thoroughly exercise:
- Trust establishment from ClusterTrustBundles, ConfigMaps, and Secrets
- Plugin sidecar trust propagation
- DropImageCertificates behavior
- Dynamic reconciliation on resource create/update/delete
- Optional vs required resource handling
The defensive delete-before-create pattern (lines 129, 357) with explanatory comments is good practice for handling test retry scenarios. The mix of positive and negative tests provides strong validation.
Optional: Review 6-minute reconciliation timeout
The CTB reconciliation tests use 6m timeouts (lines 391, 398) with 15s polling intervals. If cluster-wide CTB propagation is typically faster in your test environment, consider reducing these to improve test execution time (e.g., 3m). If the long timeout is necessary for CI stability, this is fine as-is.
860-899: Comprehensive trust verification and cleanup.The
verifyCorrectlyConfiguredTrustfunction provides excellent coverage by testing both Helm chart fetches and plugin git clones against trusted and untrusted hosts. The selection of different certificate authorities (GitHub CA vs. Google CA, line 865 comment) demonstrates careful test design.The
purgeCtbscleanup function properly handles cluster-wide resource cleanup using label selectors, preventing test pollution.Optional: External service dependency consideration
The tests depend on external public services (helm.nginx.com, stefanprodan.github.io, github.com, kernel.googlesource.com) maintaining their current certificate configurations. While this provides realistic E2E coverage, certificate changes or service unavailability could cause test flakiness. Consider:
- Documenting this dependency in test comments
- Having a fallback plan if tests become unstable due to external cert changes
This is an acceptable trade-off for E2E test realism, but worth noting.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-06T10:50:06.487Z
Learnt from: olivergondza
Repo: argoproj-labs/argocd-operator PR: 1876
File: tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go:59-62
Timestamp: 2026-01-06T10:50:06.487Z
Learning: In the file tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go (argocd-operator), the CA bundle path should be set to /etc/ssl/certs/ca-certificates.crt on Ubuntu. Downstream, for RHEL/Fedora/UBI-specific logic, see the gitops-operator fork; these OS-specific differences should not be mixed in this file.
Applied to files:
tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build
- GitHub Check: build-operator
- GitHub Check: Run end-to-end tests (v1.27.1-k3s1)
- GitHub Check: Run end-to-end tests (v1.33.5-k3s1)
- GitHub Check: Run golangci-lint and gosec
🔇 Additional comments (7)
tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go (7)
17-81: Test file structure and setup look good.The imports are well-organized, and the global variables appropriately scope test configuration. The Ubuntu-specific CA bundle path is correct for upstream argocd-operator per the established pattern. The use of public Helm registries for trusted/untrusted test sources provides good coverage.
83-98: Clean test lifecycle management.The BeforeEach/AfterEach hooks properly ensure test isolation with cleanup and cluster capability detection. The CTB support detection allows tests to gracefully skip when the feature isn't available.
99-116: Solid negative test for missing dependencies.This test correctly verifies that repo-server startup is blocked when referenced Secrets are missing, while the server component continues running. The use of
EventuallyandConsistentlyprovides proper async verification without brittle sleep calls.
401-476: Smart reconciliation selectivity test.This test properly validates that the controller only reconciles when relevant ClusterTrustBundles change, avoiding unnecessary pod restarts. The use of random label values (line 410) prevents test collision, and the negative assertions with
ConsistentlyandexpectReconcile(reconcile: false)confirm stable behavior when unrelated resources change.
480-663: Well-designed helper functions.The helper functions provide clean abstractions for test setup:
ctbUpdateproperly handles update conflicts with retry logicdetectClusterTrustBundleSupportgracefully detects cluster capabilitycreateGitPullingPluginconstructs a realistic plugin that exercises TLS trust through git clone subprocess calls- Resource creators are concise and focused
The ClusterTrustBundle APIVersion is correctly set to
v1beta1(line 651), matching the import at line 44.
690-709: CA certificate fetching is sound.The approach of fetching real CA certificates from public TLS endpoints provides realistic test data. Extracting the root CA (last in chain) is correct for ClusterTrustBundle requirements per the inline comment.
Note on TLS 1.3 minimum: The
MinVersion: tls.VersionTLS13configuration (line 691) ensures modern crypto but could fail if any test hosts (github.com, github.io, helm.nginx.com, etc.) don't support TLS 1.3. If E2E tests fail with handshake errors, consider falling back totls.VersionTLS12. However, if all current test hosts support TLS 1.3, this is fine.
711-858: Excellent test helpers and custom matchers.The
podTrustabstraction andtrustCertscustom matcher demonstrate thoughtful test design:
trustCertscombines count matching with log inspection and attaches diagnostics on failure, significantly improving debuggabilitygetTrustedCertCountcorrectly parses PEM certificates usingstrings.Lines(valid in Go 1.24+, per past discussion)expectReconcileelegantly verifies pod restart behavior for both positive and negative reconciliation casesfindRunningRepoServerPodusesEventuallyfor robust pod discoveryThe loop variable comment at line 845 is good defensive practice, though Go 1.22+ loop scoping makes it unnecessary.
|
@jgwest, do you see anything else needed to proceed with this? |
… or plugins Signed-off-by: Oliver Gondža <ogondza@gmail.com>
…plugins from Secrets and CMs Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
… CTB support Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
…rustBundles to repo-server automatically Signed-off-by: Oliver Gondža <ogondza@gmail.com>
…rest - Use default image pull policy - Use default security context - Propagate proxy env vars Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
ca12fc2 to
ec3ebf2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
controllers/argocd/util.go (1)
1033-1116: Add RBAC rule for ClusterTrustBundle watches.Line 1114 watches
certificates.ClusterTrustBundle{}resources, but the controller RBAC (config/rbac/role.yaml and kubebuilder markers) lacks permissions for thecertificates.k8s.io/clustertrustbundlesresource. Add a kubebuilder:rbac marker grantingget,list, andwatchverbs oncertificates.k8s.io/clustertrustbundlesto prevent runtime watch failures on clusters with ClusterTrustBundle API support.
🤖 Fix all issues with AI agents
In `@controllers/argocd/repo_server.go`:
- Around line 597-617: The projected CA-trust volume creation in repo_server.go
(variables volumeSource/volumeTarget and slice repoServerVolumes) can produce a
ProjectedVolumeSource with zero Sources when r.caTrustVolumes(cr) returns an
empty sources list; update the logic in the code that constructs
repoServerVolumes (around the caTrustVolumes call) to detect when sources is
empty (len(sources) == 0) and either (a) do not append the projected volume and
the corresponding emptyDir target to repoServerVolumes, or (b) surface/return a
clear validation error from the function (or caller) indicating
spec.repo.systemCATrust has no sources; ensure you change only the
construction/append logic around repoServerVolumes and reference
r.caTrustVolumes, volumeSource, and volumeTarget so the pod never receives a
ProjectedVolumeSource with zero sources.
In `@tests/ginkgo/fixture/pod/fixture.go`:
- Around line 17-30: The GetPodByNameRegexp function dereferences nameRegexp and
calls nameRegexp.String() without checking for nil; add an explicit nil guard at
the start (e.g., Expect(nameRegexp).ToNot(BeNil())) and change the failure
message in the final Expect to avoid calling nameRegexp.String() when nil (use a
safe representation or include the regexp only after the nil check) so the test
fails with a clear assertion rather than panicking.
🧹 Nitpick comments (2)
tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go (1)
720-730: Replacepanicwith Ginkgo'sFailfor idiomatic test failure handling.Using
panic(err)bypasses Ginkgo's error handling and reporting. Per Ginkgo best practices, useFail()orExpect()for test failures.🔧 Suggested fix
func (pt *podTrust) fetch() { pod := findRunningRepoServerPod(pt.k8sClient, pt.ns) pt.count = getTrustedCertCount(pod) pt.log = getRepoCertGenerationLog(pod) out, err := osFixture.ExecCommandWithOutputParam(false, false, "kubectl", "-n", pt.ns.Name, "events") - if err != nil { - panic(err) - } + Expect(err).ToNot(HaveOccurred(), "failed to get events: %s", out) pt.events = out }docs/reference/argocd.md (1)
1138-1143: Nested list indentation style (optional).The static analysis hints flag 4-space indentation on the nested list items. While both 2-space and 4-space indentation render correctly in markdown, you may want to align with the project's markdown style guide if one exists.
🔧 Optional fix for 2-space indentation
- The certificates are properly configured inside the container, permitting more sophisticated (yet secure) plugin logic. For example: - - Kustomize can invoke Helm reaching to other hosts than the source repo - - Kustomize can pull resources from other repositories/sources over HTTPS - - In general, Config Management Plugin can invoke any TLS-enabled tool present in the image with TLS verification on. + - Kustomize can invoke Helm reaching to other hosts than the source repo + - Kustomize can pull resources from other repositories/sources over HTTPS + - In general, Config Management Plugin can invoke any TLS-enabled tool present in the image with TLS verification on.
Signed-off-by: Oliver Gondža <ogondza@gmail.com>
8dd2b79 to
761c828
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go`:
- Around line 711-730: In podTrust.fetch(), replace the kubectl invocation that
currently uses the "events" subcommand with the broader-compatible form
["kubectl", "-n", pt.ns.Name, "get", "events"] by updating the
osFixture.ExecCommandWithOutputParam call in podTrust.fetch() so it runs
"kubectl get events" (matching the pattern used elsewhere, e.g. fixture.go) to
maintain compatibility with older kubectl versions.
- Around line 507-514: The detectClusterTrustBundleSupport function currently
only treats a discovery failure as absence of CTB support; update it to also
treat meta.IsNoMatchError and apierrors.IsNotFound as unsupported so the test
will skip instead of fail: when calling k8sClient.List(ctx,
&certificatesv1beta1.ClusterTrustBundleList{}), check if the returned err is an
apiutil.ErrResourceDiscoveryFailed, meta.IsNoMatchError(err), or
apierrors.IsNotFound(err) and return false for any of those cases, otherwise
Expect(err).ToNot(HaveOccurred()) and return true.
- Around line 690-701: The getCACert function uses tls.Dial without a timeout
and forces TLS 1.3; replace that by creating a net.Dialer with a short timeout
(e.g. 5s) and call tls.DialWithDialer, and change the tls.Config MinVersion from
tls.VersionTLS13 to tls.VersionTLS12 so restricted environments still work; also
add the missing imports "net" and "time". Ensure you still close the connection
and return encodeCert(pcs[len(pcs)-1]) as before.
| func detectClusterTrustBundleSupport(k8sClient client.Client, ctx context.Context) bool { | ||
| err := k8sClient.List(ctx, &certificatesv1beta1.ClusterTrustBundleList{}) | ||
| if _, ok := err.(*apiutil.ErrResourceDiscoveryFailed); ok { | ||
| return false | ||
| } | ||
| Expect(err).ToNot(HaveOccurred()) // Every other error is an error | ||
| return true | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the file structure and current imports
head -50 tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go | cat -nRepository: argoproj-labs/argocd-operator
Length of output: 1960
🏁 Script executed:
# Check the imports in the file
grep -A 30 "^import" tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go | head -40Repository: argoproj-labs/argocd-operator
Length of output: 1041
🏁 Script executed:
# Look at the exact context around the function (lines 500-520)
sed -n '500,520p' tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go | cat -nRepository: argoproj-labs/argocd-operator
Length of output: 875
🏁 Script executed:
# Search for other uses of meta.IsNoMatchError or apierrors.IsNotFound in the codebase to see if this pattern is used elsewhere
rg "meta\.IsNoMatchError|apierrors\.IsNotFound" --type goRepository: argoproj-labs/argocd-operator
Length of output: 4411
🏁 Script executed:
# Search for ErrResourceDiscoveryFailed usage in the codebase
rg "ErrResourceDiscoveryFailed" --type goRepository: argoproj-labs/argocd-operator
Length of output: 194
🏁 Script executed:
# Search for meta package usage patterns in the codebase
rg "meta\." --type go | head -30Repository: argoproj-labs/argocd-operator
Length of output: 1438
🏁 Script executed:
# Check if there's any documentation or comments about CTB support detection
rg -B 5 -A 5 "detectClusterTrustBundleSupport" tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.goRepository: argoproj-labs/argocd-operator
Length of output: 681
🏁 Script executed:
# Search for usages of this function to understand the context better
rg "detectClusterTrustBundleSupport" --type goRepository: argoproj-labs/argocd-operator
Length of output: 372
🌐 Web query:
Kubernetes client-go meta.IsNoMatchError vs apierrors.IsNotFound error handling
💡 Result:
Short answer
- meta.IsNoMatchError(err): the API discovery says "no matches for resource/kind/version" — the server doesn't know that resource (e.g., CRD not installed or using wrong apiVersion). Use when your code attempted to map a GVK/GVR and discovery failed. [1]
- apierrors.IsNotFound(err): the API server returned a 404 for a specific object (resource kind exists, but the named object/namespace was not found). Use when a Get/Delete/Update targeted a specific object that isn't present. [2]
When to check which
- If you called discovery, REST mapping, or used a dynamic client and mapping failed => check meta.IsNoMatchError.
- If you called clientset.CoreV1().Pods(...).Get(...) or server returned a Status with NotFound => check apierrors.IsNotFound.
Minimal Go examples
-
NoMatch (discovery/mapping):
if meta.IsNoMatchError(err) {
// resource kind/version not available (CRD missing, wrong apiVersion)
} -
NotFound (object-level 404):
if apierrors.IsNotFound(err) {
// object doesn't exist
}
Recommendation
- Prefer handling IsNoMatchError when you depend on a resource type (e.g., optional CRD) so you can fall back or skip; handle IsNotFound for expected-missing object semantics (create-if-missing or ignore). [1][2]
Sources
- meta.IsNoMatchError docs. [1]
- k8s api/errors IsNotFound docs. [2]
🌐 Web query:
Kubernetes ClusterTrustBundle disabled runtime-config error type client List
💡 Result:
You're seeing that error because the ClusterTrustBundle API is off by default (alpha/beta) and the API group/version is disabled in the API server's runtime-config. Fix: enable the API (or the feature gate) on the kube-apiserver so the /apis/certificates.k8s.io/... endpoints exist.
Quick fixes:
- If your cluster runs kube-apiserver directly, start it with the runtime-config flag to enable the API group/version:
--runtime-config=certificates.k8s.io/v1alpha1=true - If you use kind, add runtimeConfig in the cluster config (maps to --runtime-config). Example:
runtimeConfig:
"certificates.k8s.io/v1alpha1": "true" - Also ensure the ClusterTrustBundle feature gate is enabled if your K8s release requires it (ClusterTrustBundle is alpha/beta and may need the feature gate).
- After enabling, retry: kubectl get clustertrustbundles (or the plural in the API version you're enabling).
References:
- ClusterTrustBundle docs (feature state, API group requirements). [1]
- kind runtimeConfig / kube-apiserver --runtime-config mapping. [2]
[1] https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/
[2] https://kind.sigs.k8s.io/docs/user/configuration/
Handle unsupported CTB APIs beyond discovery failures.
When ClusterTrustBundles are disabled (e.g., runtime-config off), client.List returns NoMatch or NotFound errors instead of ErrResourceDiscoveryFailed, causing the test suite to fail instead of skip. Add handling for meta.IsNoMatchError() and apierrors.IsNotFound() alongside the existing discovery failure check.
🛠️ Suggested fix
@@
import (
"context"
"crypto/tls"
"crypto/x509"
"encoding/pem"
"fmt"
"regexp"
"strings"
+ apierrors "k8s.io/apimachinery/pkg/api/errors"
+ "k8s.io/apimachinery/pkg/api/meta"
"github.com/onsi/gomega/gcustom"
matcher "github.com/onsi/gomega/types"
"k8s.io/apimachinery/pkg/util/rand"@@
func detectClusterTrustBundleSupport(k8sClient client.Client, ctx context.Context) bool {
err := k8sClient.List(ctx, &certificatesv1beta1.ClusterTrustBundleList{})
- if _, ok := err.(*apiutil.ErrResourceDiscoveryFailed); ok {
- return false
- }
- Expect(err).ToNot(HaveOccurred()) // Every other error is an error
- return true
+ if err == nil {
+ return true
+ }
+ if _, ok := err.(*apiutil.ErrResourceDiscoveryFailed); ok ||
+ meta.IsNoMatchError(err) || apierrors.IsNotFound(err) {
+ return false
+ }
+ Expect(err).ToNot(HaveOccurred()) // Every other error is an error
+ return true
}🤖 Prompt for AI Agents
In `@tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go` around lines
507 - 514, The detectClusterTrustBundleSupport function currently only treats a
discovery failure as absence of CTB support; update it to also treat
meta.IsNoMatchError and apierrors.IsNotFound as unsupported so the test will
skip instead of fail: when calling k8sClient.List(ctx,
&certificatesv1beta1.ClusterTrustBundleList{}), check if the returned err is an
apiutil.ErrResourceDiscoveryFailed, meta.IsNoMatchError(err), or
apierrors.IsNotFound(err) and return false for any of those cases, otherwise
Expect(err).ToNot(HaveOccurred()) and return true.
| func getCACert(host string) string { | ||
| config := &tls.Config{MinVersion: tls.VersionTLS13} | ||
| conn, err := tls.Dial("tcp", host+":443", config) | ||
| Expect(err).ToNot(HaveOccurred()) | ||
| defer func() { _ = conn.Close() }() | ||
|
|
||
| pcs := conn.ConnectionState().PeerCertificates | ||
|
|
||
| // ClusterTrustBundle cannot hold leaf certificates, so testing with CA cert at least. In theory, some of the hosts | ||
| // we test against can share the same CA cert, so albeit not likely, rudimentary negative testing is needed. | ||
| return encodeCert(pcs[len(pcs)-1]) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go | head -50Repository: argoproj-labs/argocd-operator
Length of output: 202
🏁 Script executed:
wc -l tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.goRepository: argoproj-labs/argocd-operator
Length of output: 137
🏁 Script executed:
head -50 tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.goRepository: argoproj-labs/argocd-operator
Length of output: 1610
🏁 Script executed:
sed -n '680,710p' tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.goRepository: argoproj-labs/argocd-operator
Length of output: 964
🏁 Script executed:
grep -n "tls.Dial\|net.Dialer" tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.goRepository: argoproj-labs/argocd-operator
Length of output: 127
🏁 Script executed:
grep -n "VersionTLS" tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.goRepository: argoproj-labs/argocd-operator
Length of output: 129
🌐 Web query:
Go tls.Dial timeout net.Dialer best practice
💡 Result:
Short answer — prefer explicit DialContext + tls.Client (or net.Dialer with a context/deadline) and set timeouts for both the TCP connect and the TLS handshake. Example pattern and rationale:
Why
- tls.Dial / DialWithDialer will apply the dialer Timeout/deadline to the entire connect+handshake, but using DialContext gives you explicit cancellation and finer control. (sources.debian.org)
- Use a context (or net.Dialer.Timeout/Deadline) to cancel connect attempts; use a separate deadline around the TLS handshake (or Transport.TLSHandshakeTimeout for http) so a slow handshake can be aborted. (stackoverflow.com)
- Be aware that DialContext may try multiple resolved IPs and timeouts can interact per-address; test in your environment. (forum.golangbridge.org)
Recommended minimal pattern (conceptual)
- Create a net.Dialer. Use DialContext with a context.WithTimeout for the TCP connect.
- After getting the net.Conn, set a short per-operation deadline for the handshake (conn.SetDeadline or use context with tls.Handshake) and call tls.Client(...).Handshake(). Clear deadlines after handshake.
Example (sketch)
- ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second); defer cancel()
- d := &net.Dialer{Timeout: 0, KeepAlive: 30*time.Second}
- rawConn, err := d.DialContext(ctx, "tcp", "host:443")
- rawConn.SetDeadline(time.Now().Add(5*time.Second)) // handshake deadline
- tlsConn := tls.Client(rawConn, &tls.Config{ServerName: "host"})
- if err := tlsConn.Handshake(); err != nil { rawConn.Close(); return err }
- rawConn.SetDeadline(time.Time{}) // clear deadlines
Notes / best-practices checklist
- Prefer DialContext for cancellability. (stackoverflow.com)
- Use separate timeouts/deadlines for connect vs handshake (or http.Transport.TLSHandshakeTimeout). (hustcat.github.io)
- Clear deadlines after handshake so I/O isn’t unintentionally limited.
- Test behavior with multiple A records and in your network (per-IP dialing behavior). (forum.golangbridge.org)
References
[1] crypto/tls: DialWithDialer docs (connect+handshake semantics). (sources.debian.org)
[2] net.Dialer / DialContext usage and examples (timeouts & cancellation). (stackoverflow.com)
[3] TLS handshake timeout guidance (http.Transport.TLSHandshakeTimeout and separate handshake timeouts). (hustcat.github.io)
If you want, I can produce a ready-to-copy Go function implementing this pattern with proper deadline handling and comments.
Citations:
- 1: https://sources.debian.org/src/golang-refraction-networking-utls/1.2.1-3.1/tls.go/?utm_source=openai
- 2: https://stackoverflow.com/questions/46279205/how-to-abort-a-net-dial-call-in-go?utm_source=openai
- 3: https://forum.golangbridge.org/t/net-dialer-calls-dialcontext-for-network-connection-when-the-passed-address-is-a-domain-name-causes-probabilistic-connection-timeouts/36796?utm_source=openai
- 4: https://stackoverflow.com/questions/46279205/how-to-abort-a-net-dial-call-in-go?utm_source=openai
- 5: https://hustcat.github.io/timeout-in-net-http-of-golang/?utm_source=openai
- 6: https://forum.golangbridge.org/t/net-dialer-calls-dialcontext-for-network-connection-when-the-passed-address-is-a-domain-name-causes-probabilistic-connection-timeouts/36796?utm_source=openai
- 7: https://sources.debian.org/src/golang-refraction-networking-utls/1.2.1-3.1/tls.go/?utm_source=openai
- 8: https://stackoverflow.com/questions/46279205/how-to-abort-a-net-dial-call-in-go?utm_source=openai
- 9: https://hustcat.github.io/timeout-in-net-http-of-golang/?utm_source=openai
Add a network timeout and lower TLS floor for cert fetches.
tls.Dial without a timeout can hang indefinitely, and requiring TLS 1.3 can cause failures in restricted environments. Use net.Dialer with a timeout and lower the minimum to TLS 1.2.
Imports needed: "net" and "time" (currently missing).
Suggested fix
import (
"context"
"crypto/tls"
"crypto/x509"
"encoding/pem"
"fmt"
+ "net"
"regexp"
"strings"
+ "time"func getCACert(host string) string {
- config := &tls.Config{MinVersion: tls.VersionTLS13}
- conn, err := tls.Dial("tcp", host+":443", config)
+ dialer := &net.Dialer{Timeout: 10 * time.Second}
+ config := &tls.Config{MinVersion: tls.VersionTLS12}
+ conn, err := tls.DialWithDialer(dialer, "tcp", host+":443", config)
Expect(err).ToNot(HaveOccurred())
defer func() { _ = conn.Close() }()🤖 Prompt for AI Agents
In `@tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go` around lines
690 - 701, The getCACert function uses tls.Dial without a timeout and forces TLS
1.3; replace that by creating a net.Dialer with a short timeout (e.g. 5s) and
call tls.DialWithDialer, and change the tls.Config MinVersion from
tls.VersionTLS13 to tls.VersionTLS12 so restricted environments still work; also
add the missing imports "net" and "time". Ensure you still close the connection
and return encodeCert(pcs[len(pcs)-1]) as before.
| type podTrust struct { | ||
| ns *corev1.Namespace | ||
| k8sClient client.Client | ||
|
|
||
| count int | ||
| log string | ||
| events string | ||
| } | ||
|
|
||
| func (pt *podTrust) fetch() { | ||
| pod := findRunningRepoServerPod(pt.k8sClient, pt.ns) | ||
| pt.count = getTrustedCertCount(pod) | ||
| pt.log = getRepoCertGenerationLog(pod) | ||
|
|
||
| out, err := osFixture.ExecCommandWithOutputParam(false, false, "kubectl", "-n", pt.ns.Name, "events") | ||
| if err != nil { | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| } | ||
| pt.events = out | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's verify the file exists and confirm the code snippet
fd "1-120_repo_server_system_ca_trust.go" 2>/dev/null | head -5Repository: argoproj-labs/argocd-operator
Length of output: 133
🌐 Web query:
kubectl events subcommand kubectl version history compatibility
💡 Result:
Short summary — history and compatibility
-
History: events were long exposed via the Events API (kubectl get events). A community "kubectl-events" plugin (krew/standalone) existed for richer views. SIG-CLI moved the feature into kubectl proper (KEP/PRs) and promoted it to a top-level command (kubectl events); it reached beta/ship as a top-level command around the 1.26–1.28 timeframe. [1][2][3]
-
Current docs/behaviour: kubectl events is now a built-in kubectl subcommand with --for, --watch, --types, output formats, etc.; see the kubectl events reference. [2]
-
Compatibility: follow the Kubernetes Version Skew Policy — kubectl is supported within one minor version older or newer than the kube-apiserver it talks to (i.e., kubectl may be ±1 minor relative to the API server). If your cluster has mixed apiserver versions (HA skew), that narrows supported kubectl versions. For safe operation, use a kubectl release within one minor of your kube-apiserver. [4]
Sources
- SIG-CLI / release discussion & KEP/exception for moving kubectl events to top-level. [kubernetes-sig-cli Google Group]. [1]
- kubectl events reference (official docs). [2]
- History / community plugin info (kubectl-events via krew). [3]
- Kubernetes Version Skew Policy — kubectl ±1 minor compatibility. [4]
If you want, I can list exact Kubernetes release numbers/dates for when events moved between alpha/beta/stable.
🏁 Script executed:
# Check kubectl documentation and search for any kubectl usage patterns in the codebase
rg "kubectl.*events" --type goRepository: argoproj-labs/argocd-operator
Length of output: 487
Prefer kubectl get events for broader kubectl compatibility.
The kubectl events subcommand was introduced in Kubernetes 1.26+. For compatibility with older kubectl versions, use kubectl get events, which leverages the traditional Events API and works across all versions. The codebase already uses this pattern in fixture.go.
🛠️ Suggested fix
- out, err := osFixture.ExecCommandWithOutputParam(false, false, "kubectl", "-n", pt.ns.Name, "events")
+ out, err := osFixture.ExecCommandWithOutputParam(false, false, "kubectl", "-n", pt.ns.Name, "get", "events")🤖 Prompt for AI Agents
In `@tests/ginkgo/sequential/1-120_repo_server_system_ca_trust.go` around lines
711 - 730, In podTrust.fetch(), replace the kubectl invocation that currently
uses the "events" subcommand with the broader-compatible form ["kubectl", "-n",
pt.ns.Name, "get", "events"] by updating the
osFixture.ExecCommandWithOutputParam call in podTrust.fetch() so it runs
"kubectl get events" (matching the pattern used elsewhere, e.g. fixture.go) to
maintain compatibility with older kubectl versions.
|
As far as tests go, it looks throroug! LGTM |
What type of PR is this?
/kind enhancement
What does this PR do / why we need it:
Permit users to trust CAs on a repo-server system level
Have you updated the necessary documentation?
Which issue(s) this PR fixes:
Fixes #1830
How to test changes / Special notes to the reviewer:
Can be tested against:
Summary by CodeRabbit
New Features
Documentation
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.