Skip to content

S3CSI-204: Support CA certificates via ConfigMap#351

Open
anurag4DSB wants to merge 5 commits intomainfrom
improvement/S3CSI-204-support-CA-certificates
Open

S3CSI-204: Support CA certificates via ConfigMap#351
anurag4DSB wants to merge 5 commits intomainfrom
improvement/S3CSI-204-support-CA-certificates

Conversation

@anurag4DSB
Copy link
Copy Markdown
Collaborator

@anurag4DSB anurag4DSB commented Mar 19, 2026

Summary

Adds custom CA certificate support so customers can validate TLS certificates from their RING S3 endpoints. CA certificates are injected via Kubernetes ConfigMaps (not Secrets) since they are public configuration data, not confidential.

The controller pod mounts the CA ConfigMap and sets AWS_CA_BUNDLE for dynamic provisioning S3 calls. Mounter pods use an Alpine initContainer to install the CA cert into the system trust store (/etc/ssl/certs/) because mount-s3's s2n-tls only reads from there.

Changes

  • Go core: Add TLSConfig, configureTLS() for ConfigMap-based CA injection in mounter pods, buildTLSConfig() in controller with flag/env parsing and defaults, unit tests
  • Helm chart: Add tls.* values, conditional AWS_CA_BUNDLE env var, ConfigMap volume mount, and TLS env vars in controller template
  • CI/E2E: Add ensureCACertConfigMap helper, TLSAll Mage target, and e2e-tests-tls workflow job for full TLS integration testing
  • Documentation: TLS configuration guide, Helm reference update, glossary terms, mkdocs nav entry

Test plan

  • Unit tests pass (TestCreatingMountpointPodsWithTLS, TestCreatingMountpointPodsWithoutTLS)
  • Helm lint passes
  • Helm template renders correctly with/without tls.caCertConfigMap
  • Controller binary compiles
  • [x ] e2e-tests-tls CI job passes

@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

  • Init container security context is less hardened than the main container
    • Add AllowPrivilegeEscalation: false and Drop: ALL capabilities to match the main mountpoint container
  • emptyDir for shared cert store (etc-ssl-certs) has no SizeLimit
    • Add a SizeLimit to match the pattern used by the communication dir volume
  • Error wrapping uses v-formatting instead of w-formatting in magefiles/e2e.go
    • Use w-formatting consistently in ensureCACertConfigMap and TLSAll functions

Review by Claude Code

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.44%. Comparing base (b5ead2c) to head (f0fbf68).
✅ All tests successful. No failed tests found.

Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
pkg/podmounter/mppod/creator.go 97.87% <100.00%> (+1.20%) ⬆️

... and 2 files with indirect coverage changes

@@            Coverage Diff             @@
##             main     #351      +/-   ##
==========================================
+ Coverage   73.60%   74.44%   +0.83%     
==========================================
  Files          48       48              
  Lines        2800     2868      +68     
==========================================
+ Hits         2061     2135      +74     
+ Misses        607      600       -7     
- Partials      132      133       +1     

@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

  • Init container in configureTLS lacks security hardening (AllowPrivilegeEscalation, capability drop, seccomp profile) that the main container has. Add these fields to match the security posture.

    Review by Claude Code

@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

Well-structured PR. Clean separation between controller CA injection (AWS_CA_BUNDLE) and mounter pod CA injection (initContainer + shared emptyDir). Security context on the init container properly follows the restricted PodSecurity policy. Tests cover both TLS and non-TLS paths.

- TLS emptyDir volume (etc-ssl-certs) has no SizeLimit, inconsistent with communication dir pattern
- Add a SizeLimit (e.g., 2MiB) for consistency and resource protection

Review by Claude Code

@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

  • The TLS emptyDir volume (etc-ssl-certs) has no sizeLimit, unlike the communication emptyDir. Add a conservative limit (e.g., 1Mi) for consistency and defense in depth.
  • The init container shell command lacks set -e. If cp fails, cat still runs and the container exits 0 with an incomplete CA bundle.
  • buildTLSConfig silently falls back to defaults on parse errors. Consider failing hard so operators notice misconfigurations immediately.
  • The controller Helm template mounts the TLS ConfigMap without optional: true. If the ConfigMap is missing, the pod hangs in ContainerCreating.

Review by Claude Code

@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

Minor issues found, see inline comments for details.

  • Glossary entries not in alphabetical order (CA before CLI, PEM after KMS)
  • TLS emptyDir (etc-ssl-certs) has no SizeLimit unlike the communication dir

Review by Claude Code

@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

  • emptyDir for TLS cert store (etc-ssl-certs) has no SizeLimit, unlike the communication emptyDir. Add a limit (e.g., 1MiB) for consistency and to bound resource usage.

Review by Claude Code

@anurag4DSB anurag4DSB force-pushed the improvement/S3CSI-204-support-CA-certificates branch from e547516 to d3784c5 Compare March 19, 2026 13:04
@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

Clean, well-structured PR. Good security context on the init container, proper error wrapping throughout, and solid test coverage for both TLS and non-TLS paths.

One issue:
- Stale comment in TLSAll() (magefiles/e2e.go:455-457) claims the controller pod stays in ContainerCreating if the ConfigMap is missing, but the Helm template uses optional: true on the volume, so this is incorrect.
- Update the comment to reflect the actual motivation (avoiding transient S3 TLS errors at startup).

Review by Claude Code

@anurag4DSB anurag4DSB force-pushed the improvement/S3CSI-204-support-CA-certificates branch from d3784c5 to 32aca2d Compare March 19, 2026 13:25
@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

Clean, well-structured PR. Two minor observations:

- TLSEmptyDirSizeLimit (1MiB) is adequate today but leaves little margin for larger custom CA bundles or Alpine base bundle growth. Consider 2MiB or a sizing comment.
- The system CA bundle (~200KB) + custom CAs should fit, but a silent pod failure if exceeded would be hard to debug.

- Mounter pod ConfigMap volume is not optional, while the controller ConfigMap volume is. If the ConfigMap is missing in mount-s3 namespace, mounter pods get stuck in ContainerCreating with an opaque Kubernetes event.
- Consider making it optional and failing in the init container with a clear error, or documenting this failure mode more prominently in the troubleshooting section.

Review by Claude Code

@anurag4DSB anurag4DSB force-pushed the improvement/S3CSI-204-support-CA-certificates branch from 32aca2d to 7aede71 Compare March 19, 2026 13:47
@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

LGTM

Well-structured feature. The ConfigMap-based CA injection, init container approach for s2n-tls trust store, restricted security contexts, and test coverage all look solid. A few minor observations (non-blocking):

- The optional: true on ConfigMap volumes is a pragmatic choice for ordering flexibility, but in production if someone sets tls.caCertConfigMap without creating the ConfigMap, the controller's AWS_CA_BUNDLE will point to a missing file — the resulting AWS SDK error may not immediately point users to the missing ConfigMap. The init container's explicit test -f check on mounter pods is much better UX by comparison.
- The shell command in creator.go:234 is long — if it grows further, consider breaking it into a small shell script embedded as a ConfigMap or using environment variables for the error message to improve readability.

Review by Claude Code

@anurag4DSB anurag4DSB force-pushed the improvement/S3CSI-204-support-CA-certificates branch from 7aede71 to 79b979d Compare March 19, 2026 14:29
@anurag4DSB anurag4DSB force-pushed the improvement/S3CSI-204-support-CA-certificates branch from 79b979d to efbf43b Compare March 19, 2026 14:32
@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

LGTM — well-structured PR with good separation between controller (AWS_CA_BUNDLE) and mounter pod (initContainer + system trust store) TLS paths. Security contexts on the init container are properly restricted. Two minor observations:

- ConfigMap Optional asymmetry (creator.go:202): Mounter pod ConfigMap volume is Optional=true (better error via init container), but controller Helm template ConfigMap is not optional (kubelet scheduling failure). The asymmetry is reasonable but worth a comment explaining the design choice.
- Init container shell command readability (creator.go:234): The long fmt.Sprintf shell command could be more readable by passing the ConfigMap name as a container env var instead of embedding it in the command string.

Review by Claude Code

@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

LGTM

Clean implementation. TLS config is global per controller deployment so it doesn't affect Mountpoint Pod sharing decisions. Helm chart correctly passes CA cert volume + AWS_CA_BUNDLE to the s3-csi-controller container (for dynamic provisioning) and TLS env vars to the s3-pod-reconciler container (for mounter pod creation). Init container has proper restricted security context matching the main container. Tests cover both TLS-enabled and TLS-disabled paths.

Review by Claude Code

Add TLS configuration to the CSI controller and mounter pod creator:
- New --tls-* flags on controller for CA cert ConfigMap, init container
  image, and resource limits (fail-fast on invalid quantities)
- Init container appends custom CA bundle to system certs in a shared
  EmptyDir (1MiB limit) mounted at /etc/ssl/certs
- Init container uses set -e for proper error propagation
- AWS_CA_BUNDLE env var set on mounter container when TLS is enabled

Issue: S3CSI-204
Add tls.caCertConfigMap value to configure custom CA certificate
support. When set, the controller passes --tls-ca-cert-configmap
to the CSI controller container, and mounts the ConfigMap as a
volume (marked optional to avoid blocking pod startup).

Issue: S3CSI-204
Add mage e2e:tlsAll target that deploys an HTTPS S3 proxy with
custom CA, creates CA cert ConfigMaps, installs the CSI driver
with TLS enabled, and runs E2E tests against the HTTPS endpoint.
Add tls-e2e job to the CI workflow. Uses %w error wrapping
throughout for proper error chain preservation.

Issue: S3CSI-204
Add dedicated TLS configuration guide covering prerequisites,
Helm values, verification steps, and troubleshooting. Update
Helm chart reference with tls.* parameters. Add CA/PEM/TLS
terms to glossary. Add TLS note to volume provisioning docs.

Issue: S3CSI-204
@anurag4DSB anurag4DSB force-pushed the improvement/S3CSI-204-support-CA-certificates branch from efbf43b to 164fdb1 Compare March 19, 2026 15:22
@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

  • Init container CA bundle concatenation lacks newline guard between system bundle and custom CA, risking PEM corruption
    • Add a newline between the cp and cat commands in the init container shell script

Review by Claude Code

Ensure a newline separates the system CA bundle from the custom
CA certificate when building the combined trust store. Prevents
PEM corruption if the system bundle does not end with a newline.

Issue: S3CSI-204
@claude
Copy link
Copy Markdown

claude bot commented Mar 19, 2026

LGTM

Review by Claude Code


## TLS Configuration

<!-- markdownlint-disable MD046 -->
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well deactivate MD046 as a whole in the .markdownlint.yaml I find it very hard to support on mkdocs.

Comment on lines +24 to +31
Create a ConfigMap containing your CA certificate in the CSI driver namespace
(typically `kube-system`):

```bash
kubectl create configmap s3-ca-cert \
--from-file=ca-bundle.crt=/path/to/your/ca.crt \
-n kube-system
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I'm missing a piece (it's been a while that I haven't written helm charts) why are we not doing this step within the helm chart? (For step 3 as well).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good point. We also have a feedback from @davidtencer as well

At first sight I am in favor of option 4 and maybe with an improvement: can we create the configmap in both namespaces using the helm chart ?

Reference

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for trusting custom TLS CA certificates when the driver talks to HTTPS S3 endpoints (controller-side AWS SDK calls and mounter pod mount-s3/s2n-tls).

Changes:

  • Add mounter-pod TLS config to mount a CA ConfigMap and use an initContainer to build a combined trust bundle in /etc/ssl/certs.
  • Add controller-side TLS config parsing (flags/env) and Helm wiring to mount the CA ConfigMap + set AWS_CA_BUNDLE.
  • Add TLS documentation and CI/E2E workflow coverage for TLS scenarios.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pkg/podmounter/mppod/creator.go Introduces TLSConfig and TLS volume/initContainer wiring for mounter pods.
pkg/podmounter/mppod/creator_test.go Adds unit tests validating pod spec changes with/without TLS enabled.
cmd/scality-csi-controller/main.go Adds flag/env parsing to construct TLS config for spawned mounter pods.
charts/scality-mountpoint-s3-csi-driver/values.yaml Adds tls.* Helm values for CA ConfigMap + initContainer settings.
charts/scality-mountpoint-s3-csi-driver/templates/controller.yaml Mounts CA ConfigMap into controller and sets TLS-related env vars.
magefiles/e2e.go Adds helpers/targets for TLS E2E flow and ConfigMap creation.
.github/workflows/e2e-tests.yaml Adds a dedicated TLS E2E job to validate end-to-end TLS behavior in CI.
docs/driver-deployment/tls-configuration.md New TLS configuration guide explaining setup and troubleshooting.
docs/concepts-and-reference/helm-chart-configuration-reference.md Documents new tls.* chart values and ordering constraints.
docs/troubleshooting.md Adds TLS-related troubleshooting entries.
docs/volume-provisioning/index.md Links to the TLS configuration guide from provisioning docs.
docs/glossary.md Adds TLS/CA-related glossary entries.
mkdocs.yml Adds TLS configuration page to the MkDocs navigation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

| "Access Denied" | Invalid S3 credentials | 1. Check secret contains `access_key_id` and `secret_access_key`<br/>2. Test credentials with AWS CLI<br/>3. Check bucket policy |
| "InvalidBucketName" | Bucket name issue | 1. Check bucket exists<br/>2. Check bucket name format<br/>3. Ensure no typos |
| "AWS_ENDPOINT_URL environment variable must be set" | Missing endpoint configuration | Set `s3EndpointUrl` in Helm values or driver configuration |
| TLS handshake failure or certificate verify failed | CA certificate ConfigMap missing or incorrect | Check ConfigMap exists in both `kube-system` and `mount-s3` namespaces with key `ca-bundle.crt`. See [TLS Configuration](driver-deployment/tls-configuration.md#certificate-not-found) |
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This troubleshooting entry hard-codes kube-system and mount-s3 as the namespaces to check for the CA ConfigMap. The controller namespace and the mounter namespace are configurable (--namespace / Helm mountpointPod.namespace), so users installing into a different namespace could be misled. Consider wording this as “controller namespace” and “mounter pod namespace (mountpointPod.namespace)” rather than specific names, or call out that kube-system/mount-s3 are only defaults.

Suggested change
| TLS handshake failure or certificate verify failed | CA certificate ConfigMap missing or incorrect | Check ConfigMap exists in both `kube-system` and `mount-s3` namespaces with key `ca-bundle.crt`. See [TLS Configuration](driver-deployment/tls-configuration.md#certificate-not-found) |
| TLS handshake failure or certificate verify failed | CA certificate ConfigMap missing or incorrect | Check the CA ConfigMap exists in both the controller namespace (default: `kube-system`) and the mounter pod namespace (`mountpointPod.namespace`, default: `mount-s3`) with key `ca-bundle.crt`. See [TLS Configuration](driver-deployment/tls-configuration.md#certificate-not-found) |

Copilot uses AI. Check for mistakes.
Comment on lines +416 to +417
if _, err := os.Stat(caCertPath); os.IsNotExist(err) {
return fmt.Errorf("CA certificate not found at %s (run GenerateTLSCerts first)", caCertPath)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.Stat errors other than IsNotExist (e.g., permission issues, transient filesystem errors) are currently ignored and will fall through to later kubectl steps with a less direct failure. Consider handling any non-nil error from os.Stat and returning it with context so failures are easier to diagnose.

Suggested change
if _, err := os.Stat(caCertPath); os.IsNotExist(err) {
return fmt.Errorf("CA certificate not found at %s (run GenerateTLSCerts first)", caCertPath)
if _, err := os.Stat(caCertPath); err != nil {
if os.IsNotExist(err) {
return fmt.Errorf("CA certificate not found at %s (run GenerateTLSCerts first)", caCertPath)
}
return fmt.Errorf("failed to stat CA certificate at %s: %w", caCertPath, err)

Copilot uses AI. Check for mistakes.
ImagePullPolicy: c.config.TLS.InitImagePullPolicy,
Command: []string{
"sh", "-c",
"set -e; cp /etc/ssl/certs/ca-certificates.crt /shared-certs/ca-certificates.crt; echo >> /shared-certs/ca-certificates.crt; cat /custom-ca/ca-bundle.crt >> /shared-certs/ca-certificates.crt",
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initContainer command assumes the init image already contains /etc/ssl/certs/ca-certificates.crt. The default init image is alpine:3.21, which may not include the ca-certificates bundle by default; if the file is missing the cp will fail and mounter pods will never start when TLS is enabled. Consider installing/refreshing CA certs in the initContainer (or switching the default init image to one that guarantees the bundle is present) and fail with a clearer message if the bundle path is missing.

Suggested change
"set -e; cp /etc/ssl/certs/ca-certificates.crt /shared-certs/ca-certificates.crt; echo >> /shared-certs/ca-certificates.crt; cat /custom-ca/ca-bundle.crt >> /shared-certs/ca-certificates.crt",
"set -eu; if [ ! -f /etc/ssl/certs/ca-certificates.crt ]; then echo 'ERROR: system CA bundle /etc/ssl/certs/ca-certificates.crt not found in TLS init container image; ensure the init image includes ca-certificates.' 1>&2; exit 1; fi; cp /etc/ssl/certs/ca-certificates.crt /shared-certs/ca-certificates.crt; echo >> /shared-certs/ca-certificates.crt; cat /custom-ca/ca-bundle.crt >> /shared-certs/ca-certificates.crt",

Copilot uses AI. Check for mistakes.
Comment on lines +322 to +348
- name: Run TLS E2E Tests
run: |
mkdir -p test-results
S3_ENDPOINT_URL=https://s3.scality.com:8443 \
CSI_IMAGE_TAG=${{ github.sha }} \
CSI_IMAGE_REPOSITORY=ghcr.io/${{ github.repository }} \
JUNIT_REPORT=./test-results/e2e-tls-tests-results.xml \
mage e2e:tlsAll

- name: Stop Capture and Collect Artifacts
if: always()
run: mage e2e:stopCapture

- name: Upload Test Artifacts
if: always()
uses: actions/upload-artifact@v4
with:
name: e2e-tls-test-artifacts
path: artifacts

- name: Upload test results to Codecov
if: ${{ always() }}
uses: codecov/test-results-action@v1
with:
token: ${{ secrets.CODECOV_TOKEN }}
file: ./tests/e2e/test-results/e2e-tls-tests-results.xml
flags: e2e_tls_tests
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JUnit report is written to ./test-results/e2e-tls-tests-results.xml via JUNIT_REPORT, but the Codecov upload step points to ./tests/e2e/test-results/e2e-tls-tests-results.xml. This mismatch will cause the Codecov step to fail to find the report (or upload an empty/missing file). Align the paths (either write into ./tests/e2e/test-results/ or upload from ./test-results/).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants