Skip to content

feat(results-cache): Add TLS support for external MongoDB/DocumentDB connections#2226

Open
chandle-revans wants to merge 3 commits intoy-scope:mainfrom
chandle-revans:feat/results-cache-tls-support
Open

feat(results-cache): Add TLS support for external MongoDB/DocumentDB connections#2226
chandle-revans wants to merge 3 commits intoy-scope:mainfrom
chandle-revans:feat/results-cache-tls-support

Conversation

@chandle-revans
Copy link
Copy Markdown

@chandle-revans chandle-revans commented Apr 24, 2026

Description

Adds optional TLS support for the results cache (MongoDB) connection across the Python components, the webui server, and the Helm chart. This enables CLP to connect to externally managed MongoDB compatible services that require TLS, most notably AWS DocumentDB, which enforces TLS by default without requiring users to disable TLS on the upstream service.

Changes

clp-py-utils: ResultsCache model (components/clp-py-utils/clp_py_utils/clp_config.py)

  • Added two optional fields:
    • tls: bool = False
    • tls_ca_file: NonEmptyStr | None = None
  • Added a model_validator that rejects tls_ca_file being set while tls is False (prevents silent misconfiguration).
  • Updated get_uri() to append ?tls=true and optionally &tlsCAFile=<path> when configured.
    • NOTE: Both pymongo (Python) and mongocxx (C++ reducer) parse these query parameters natively, so no additional driver-side changes are needed.

Webui server: (components/webui/server)

  • settings.json: added MongoDbTls: false and MongoDbTlsCaFile: null placeholders.
  • src/plugins/external/mongo.ts: appends &tls=true and &tlsCAFile=<path> to the connection URL when configured. Uses explicit type assertions on the JSON-derived literals to satisfy strict TypeScript settings.

Helm chart: (tools/deployment/package-helm)

  • values.yaml: added documented tls, tls_ca_file, and tlsVolume fields under clpConfig.results_cache. The tlsVolume shape mirrors Kubernetes' native volume/volumeMount structures, so users can supply the CA bundle from any source (Secret, ConfigMap, hostPath, etc.) keeping the chart cloud agnostic.
  • templates/_helpers.tpl: three new helpers:
    • clp.resultsCacheTlsVolume / clp.resultsCacheTlsVolumeMount: render the volume/mount from tlsVolume config.
    • clp.resultsCacheUri: builds the full MongoDB URI including TLS params; output matches Python's get_uri().
  • templates/configmap.yaml: renders tls/tls_ca_file into clp-config.yaml, and MongoDbTls/MongoDbTlsCaFile into the webui server settings.
  • templates/results-cache-indices-creator-job.yaml: uses the resultsCacheUri helper, optionally mounts the TLS volume.
  • Deployments updated with optional TLS volume + volumeMount: query-scheduler, query-worker, garbage-collector, reducer, webui. All additions are guarded by {{- if .Values.clpConfig.results_cache.tlsVolume }} so they're no-ops when unused.

Tests: (components/clp-py-utils/tests/test_results_cache.py)

New unit tests covering:

  • Default URI (no TLS params)
  • Custom host/port/db_name
  • TLS enabled (URI includes ?tls=true)
  • TLS + CA file (URI includes ?tls=true&tlsCAFile=<path>)
  • Validation error when tls_ca_file is set without tls=True
  • Default field values

Documentation (docs/src/user-docs/guides-external-database.md)

  • Added a new Enabling TLS for the results cache subsection to guides-external-database.md, plus a callout in the AWS DocumentDB/MongoDB Atlas section noting that DocumentDB enforces TLS by default.

Backwards Compatibility

Fully backwards-compatible. All new fields default to TLS being disabled:

  • tls: false
  • tls_ca_file: null
  • tlsVolume: null

When none of the new fields are set, the rendered Helm output is byte-identical to the previous version aside from two new tls: false / tls_ca_file: null lines in the clp-config.yaml (which the updated ResultsCache model accepts as defaults). Existing deployments require no changes.

Example Usage (AWS DocumentDB)

clpConfig:
  bundled:
    - "queue"
    - "redis"
    # results_cache removed since we're using external DocumentDB

  results_cache:
    host: "my-cluster.cluster-xyz.us-east-1.docdb.amazonaws.com"
    port: 27017
    db_name: "clp-query-results"
    tls: true
    tls_ca_file: "/etc/ssl/certs/global-bundle.pem"
    tlsVolume:
      volume:
        name: results-cache-ca
        secret:
          secretName: docdb-ca-bundle
      mount:
        mountPath: /etc/ssl/certs/global-bundle.pem
        subPath: global-bundle.pem

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • all TLS settings are disabled by default, no breaking changes observed when running helm template
  • added 6 new unit tests for TLS configuration

Summary by CodeRabbit

Release Notes

  • New Features

    • Added MongoDB TLS support for the results cache, enabling secure connections to MongoDB-compatible services such as AWS DocumentDB
    • Users can now enable TLS and optionally specify custom CA certificate files for authentication
  • Documentation

    • Added guide for configuring TLS connections to MongoDB-compatible databases across deployment methods

@chandle-revans chandle-revans requested a review from a team as a code owner April 24, 2026 19:23
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

Walkthrough

This pull request adds MongoDB TLS support throughout CLP, enabling secure connections to TLS-enforcing databases like AWS DocumentDB. Changes include Python configuration models with TLS validation, TypeScript URI construction logic, Helm templates for CA certificate volume mounting, and comprehensive documentation on configuring and deploying TLS-secured results caches.

Changes

Cohort / File(s) Summary
Core Configuration & Validation
components/clp-py-utils/clp_py_utils/clp_config.py
Added tls and tls_ca_file fields to ResultsCache model with post-validation to enforce that tls_ca_file requires tls enabled. Updated get_uri() method to conditionally append tls=true and tlsCAFile query parameters based on configuration.
Python Tests
components/clp-py-utils/tests/test_results_cache.py
New test suite validating URI construction with default settings, custom host/port/db_name, TLS parameter appending, CA file handling, and validation error when tls_ca_file is provided without tls enabled.
Web UI Configuration
components/webui/server/settings.json
Added MongoDbTls (defaults to false) and MongoDbTlsCaFile (defaults to null) configuration fields.
Application Integration
components/webui/server/src/plugins/external/mongo.ts
Updated autoConfig to dynamically construct MongoDB connection string by conditionally appending &tls=true and &tlsCAFile=<path> query parameters based on settings.
Documentation
docs/src/user-docs/guides-external-database.md
Added comprehensive guide for enabling TLS with MongoDB-compatible services, including CLP configuration options, URI modification details, deployment-specific instructions (Docker Compose and Helm), and AWS DocumentDB example commands.
Helm Helper Templates
tools/deployment/package-helm/templates/_helpers.tpl
Added three new Helm helper templates: clp.resultsCacheTlsVolumeMount for CA certificate volume mount, clp.resultsCacheTlsVolume for volume definition, and clp.resultsCacheUri for constructing the complete MongoDB connection URI with conditional TLS parameters.
Helm Configuration
tools/deployment/package-helm/templates/configmap.yaml
Extended clp-config.yaml and webui-server-settings.json within the ConfigMap to inject tls and tls_ca_file fields using Helm templating.
Helm Deployment Templates
tools/deployment/package-helm/templates/garbage-collector-deployment.yaml, query-scheduler-deployment.yaml, query-worker-deployment.yaml, reducer-deployment.yaml, webui-deployment.yaml, results-cache-indices-creator-job.yaml
Conditionally wire TLS volume mounts and pod volumes across all results-cache-connecting components when clpConfig.results_cache.tlsVolume is enabled. Results cache indices creator additionally uses the new clp.resultsCacheUri helper for URI construction.
Helm Values Configuration
tools/deployment/package-helm/values.yaml
Extended clpConfig.results_cache with three new configuration keys: tls (boolean, defaults to false), tls_ca_file (nullable path, defaults to null), and tlsVolume (nullable volume object, defaults to null).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change: adding TLS support for external MongoDB/DocumentDB connections to the results cache component.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Ruff (0.15.11)
components/clp-py-utils/clp_py_utils/clp_config.py

Unexpected Ruff issue shape at index 94


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@components/clp-py-utils/clp_py_utils/clp_config.py`:
- Around line 520-529: The MongoDB URI builder in get_uri currently injects
self.tls_ca_file unencoded into the query string, which breaks for paths with
reserved URI chars; update get_uri to percent-encode the tls_ca_file value (e.g.
using urllib.parse.quote) before appending f"tlsCAFile=..." so characters like
spaces, :, \, &, ? etc. are safely encoded; ensure you import urllib.parse.quote
at top and only encode the tls_ca_file when it is truthy, leaving other params
unchanged.

In `@components/webui/server/src/plugins/external/mongo.ts`:
- Around line 12-14: The mongoDbTlsCaFile value is concatenated raw into the
connection URI which can break the query string if it contains reserved
characters; change the append logic to URL-encode mongoDbTlsCaFile (use
encodeURIComponent or equivalent) when building the url variable so the
tlsCAFile query param is safe, and apply the same encoding pattern to the other
places mentioned (e.g., ResultsCache.get_uri and the Helm helper
clp.resultsCacheUri) for consistency.
- Around line 8-9: The settings type assertions are unsafe: remove the `as`
casts on settings.MongoDbTls and settings.MongoDbTlsCaFile and instead coerce or
validate at runtime; for mongoDbTls (variable mongoDbTls) use an explicit
default/coercion from settings.MongoDbTls (e.g. treat missing/invalid as false),
and for mongoDbTlsCaFile (variable mongoDbTlsCaFile) validate that
settings.MongoDbTlsCaFile is a string and otherwise set it to null; modify the
code in the module that references settings to perform these runtime checks
rather than using `as` to silence the compiler.

In `@docs/src/user-docs/guides-external-database.md`:
- Line 280: Update the `[helm-values]` reference so it uses a relative
repository path to the values.yaml file instead of the hard-coded GitHub URL
that points to `main`; locate the `[helm-values]` link definition and replace
the absolute GitHub URL with a relative file path (so it follows the docs build
ref), and if you need to link to the directory `tools/deployment/package-helm/`
use a GitHub URL that injects the DOCS_VAR_CLP_GIT_REF variable (not `main`) to
keep the directory-level reference version-agnostic.
- Around line 224-228: Update the AWS DocumentDB CA bundle download snippet to
instruct users to verify the file integrity and authenticity: mention including
an expected checksum (e.g., SHA-256) alongside the wget URL and show how to
verify with sha256sum (compare against the published checksum), and add a short
note that HTTPS/TLS validates the server by default so users should ensure they
fetch from the official AWS URL (or verify the checksum) before trusting the CA
bundle; reference the existing wget download line so reviewers can replace or
augment that snippet with the verification guidance.

In `@tools/deployment/package-helm/templates/_helpers.tpl`:
- Around line 446-453: The helpers clp.resultsCacheTlsVolumeMount and
clp.resultsCacheTlsVolume currently assume
.Values.clpConfig.results_cache.tlsVolume is defined; update these helper
templates to either (a) use the required function to fail fast with a clear
error (e.g. required "message" .Values.clpConfig.results_cache.tlsVolume) before
dereferencing volume.* and mount.* or (b) wrap all dereferences in an if block
that returns nothing when .Values.clpConfig.results_cache.tlsVolume is nil/empty
so the helper becomes a no-op; apply the same pattern to both helpers so callers
no longer get confusing template errors.

In `@tools/deployment/package-helm/values.yaml`:
- Line 199: The example value mount.readOnly is misleading because the template
helper clp.resultsCacheTlsVolumeMount in _helpers.tpl hardcodes readOnly: true
and ignores tlsVolume.mount.readOnly; either remove mount.readOnly from the
example in values.yaml or modify the helper clp.resultsCacheTlsVolumeMount to
honor the value by reading
.Values.clpConfig.results_cache.tlsVolume.mount.readOnly (with a default true)
so user changes to mount.readOnly take effect.
- Around line 186-200: The values.yaml entry tlsVolume lacks validation and can
produce invalid Pod specs; add validation to ensure tlsVolume has the required
shape (volume.name and mount.mountPath present, mount.subPath optional) by
either adding a values.schema.json JSON Schema entry for tlsVolume (require
properties: volume object with required name, mount object with required
mountPath, optional subPath) or adding a fail check in the Helm helpers (e.g.,
in _helpers.tpl) that inspects .Values.tlsVolume and calls fail with a clear
message if volume.name or mount.mountPath are missing; reference
clp.resultsCacheTlsVolume and clp.resultsCacheTlsVolumeMount to ensure the same
validated keys are used across templates.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3d59b046-76c0-46b9-9bed-dc0c57be5b6b

📥 Commits

Reviewing files that changed from the base of the PR and between b121bf9 and b65488f.

📒 Files selected for processing (14)
  • components/clp-py-utils/clp_py_utils/clp_config.py
  • components/clp-py-utils/tests/test_results_cache.py
  • components/webui/server/settings.json
  • components/webui/server/src/plugins/external/mongo.ts
  • docs/src/user-docs/guides-external-database.md
  • tools/deployment/package-helm/templates/_helpers.tpl
  • tools/deployment/package-helm/templates/configmap.yaml
  • tools/deployment/package-helm/templates/garbage-collector-deployment.yaml
  • tools/deployment/package-helm/templates/query-scheduler-deployment.yaml
  • tools/deployment/package-helm/templates/query-worker-deployment.yaml
  • tools/deployment/package-helm/templates/reducer-deployment.yaml
  • tools/deployment/package-helm/templates/results-cache-indices-creator-job.yaml
  • tools/deployment/package-helm/templates/webui-deployment.yaml
  • tools/deployment/package-helm/values.yaml

Comment thread components/clp-py-utils/clp_py_utils/clp_config.py
Comment thread components/webui/server/src/plugins/external/mongo.ts
Comment thread components/webui/server/src/plugins/external/mongo.ts
Comment thread docs/src/user-docs/guides-external-database.md
Comment thread docs/src/user-docs/guides-external-database.md
Comment on lines +446 to +453
{{- define "clp.resultsCacheTlsVolumeMount" -}}
name: {{ .Values.clpConfig.results_cache.tlsVolume.volume.name | quote }}
mountPath: {{ .Values.clpConfig.results_cache.tlsVolume.mount.mountPath | quote }}
{{- if .Values.clpConfig.results_cache.tlsVolume.mount.subPath }}
subPath: {{ .Values.clpConfig.results_cache.tlsVolume.mount.subPath | quote }}
{{- end }}
readOnly: true
{{- end }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Helper assumes tlsVolume is non-null; add a guard or document the precondition.

clp.resultsCacheTlsVolumeMount and clp.resultsCacheTlsVolume dereference .Values.clpConfig.results_cache.tlsVolume.volume.* / .mount.* directly. If a future caller forgets the {{- if .Values.clpConfig.results_cache.tlsVolume }} guard (or tlsVolume is set to an empty map rather than null), Helm will render confusing errors.

Consider either (a) failing fast with a clear message using required, or (b) making the helpers internally no-op when tlsVolume is unset, so call sites don't all need the guard:

♻️ Option A — fail fast with a clear message
 {{- define "clp.resultsCacheTlsVolumeMount" -}}
-name: {{ .Values.clpConfig.results_cache.tlsVolume.volume.name | quote }}
-mountPath: {{ .Values.clpConfig.results_cache.tlsVolume.mount.mountPath | quote }}
+{{- $tlsVolume := .Values.clpConfig.results_cache.tlsVolume | required
+    "clpConfig.results_cache.tlsVolume must be set to use this helper" -}}
+name: {{ $tlsVolume.volume.name | quote }}
+mountPath: {{ $tlsVolume.mount.mountPath | quote }}
-{{- if .Values.clpConfig.results_cache.tlsVolume.mount.subPath }}
-subPath: {{ .Values.clpConfig.results_cache.tlsVolume.mount.subPath | quote }}
+{{- with $tlsVolume.mount.subPath }}
+subPath: {{ . | quote }}
 {{- end }}
 readOnly: true
 {{- end }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/deployment/package-helm/templates/_helpers.tpl` around lines 446 - 453,
The helpers clp.resultsCacheTlsVolumeMount and clp.resultsCacheTlsVolume
currently assume .Values.clpConfig.results_cache.tlsVolume is defined; update
these helper templates to either (a) use the required function to fail fast with
a clear error (e.g. required "message"
.Values.clpConfig.results_cache.tlsVolume) before dereferencing volume.* and
mount.* or (b) wrap all dereferences in an if block that returns nothing when
.Values.clpConfig.results_cache.tlsVolume is nil/empty so the helper becomes a
no-op; apply the same pattern to both helpers so callers no longer get confusing
template errors.

Comment on lines +186 to +200
# Optional volume and mount to inject a TLS CA certificate into pods that connect to the
# results cache. When set, the volume is added to all pods that use the results cache
# connection (query-scheduler, query-worker, garbage-collector, reducer, webui, and
# results-cache-indices-creator).
# Example:
# tlsVolume:
# volume:
# name: results-cache-ca
# secret:
# secretName: my-docdb-ca-bundle
# mount:
# mountPath: /etc/ssl/certs/results-cache-ca.pem
# subPath: global-bundle.pem
# readOnly: true
tlsVolume: null
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider documenting/validating the required shape of tlsVolume.

tlsVolume is consumed across multiple templates (clp.resultsCacheTlsVolume, clp.resultsCacheTlsVolumeMount) and must contain exactly volume.name, mount.mountPath, and optionally mount.subPath. A malformed tlsVolume (e.g., missing volume.name) will silently render an invalid Pod spec and fail at kubectl apply.

Consider adding a values.schema.json entry (or a Helm fail check in _helpers.tpl) to catch misconfiguration earlier with a clear error message. Optional — helpful for operators adopting TLS.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/deployment/package-helm/values.yaml` around lines 186 - 200, The
values.yaml entry tlsVolume lacks validation and can produce invalid Pod specs;
add validation to ensure tlsVolume has the required shape (volume.name and
mount.mountPath present, mount.subPath optional) by either adding a
values.schema.json JSON Schema entry for tlsVolume (require properties: volume
object with required name, mount object with required mountPath, optional
subPath) or adding a fail check in the Helm helpers (e.g., in _helpers.tpl) that
inspects .Values.tlsVolume and calls fail with a clear message if volume.name or
mount.mountPath are missing; reference clp.resultsCacheTlsVolume and
clp.resultsCacheTlsVolumeMount to ensure the same validated keys are used across
templates.

# mount:
# mountPath: /etc/ssl/certs/results-cache-ca.pem
# subPath: global-bundle.pem
# readOnly: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

mount.readOnly in the example is ignored by the helper.

clp.resultsCacheTlsVolumeMount in _helpers.tpl hard-codes readOnly: true and never reads tlsVolume.mount.readOnly. Including readOnly: true in this example is harmless today, but misleading: a user who copies the example and sets readOnly: false would incorrectly assume their value takes effect.

Either drop this key from the example, or update the helper to honor it (e.g., readOnly: {{ .Values.clpConfig.results_cache.tlsVolume.mount.readOnly | default true }}).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/deployment/package-helm/values.yaml` at line 199, The example value
mount.readOnly is misleading because the template helper
clp.resultsCacheTlsVolumeMount in _helpers.tpl hardcodes readOnly: true and
ignores tlsVolume.mount.readOnly; either remove mount.readOnly from the example
in values.yaml or modify the helper clp.resultsCacheTlsVolumeMount to honor the
value by reading .Values.clpConfig.results_cache.tlsVolume.mount.readOnly (with
a default true) so user changes to mount.readOnly take effect.

@hoophalab
Copy link
Copy Markdown
Contributor

Hi @chandle-revans , thank you for your contribution! We are looking into this internally and will review this PR in the next two days.

Copy link
Copy Markdown
Contributor

@hoophalab hoophalab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @chandle-revans, thanks for the contribution! This PR looks solid overall. I’ve left a few comments below

If you're tied up with other priorities, just let us know. We’re happy to handle these edits for you, provided you're happy with the high-level design.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow doesn't pass. Can you run npm run lint:fix under components/webui and fix lint errors?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you bump version in Charts.yaml to 0.3.2-dev.1 to satisfy helm lint?

[docdb-ca]: https://docs.aws.amazon.com/documentdb/latest/developerguide/security.encryption.ssl.html
[docdb-tls-param]: https://docs.aws.amazon.com/documentdb/latest/developerguide/security.encryption.in-transit.html
[docker-compose-orchestration]: guides-docker-compose-deployment.md
[helm-values]: https://github.com/y-scope/clp/blob/main/tools/deployment/package-helm/values.yaml
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[helm-values]: https://github.com/y-scope/clp/blob/main/tools/deployment/package-helm/values.yaml
[helm-values]: https://github.com/y-scope/clp/blob/DOCS_VAR_CLP_GIT_REF/tools/deployment/package-helm/values.yaml

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can we rename this file as test_clp_config.py?
  2. pytest isn't configured -- this is the first unit test in this component. Can we add pytest to pyproject.toml?
diff --git a/components/clp-py-utils/pyproject.toml b/components/clp-py-utils/pyproject.toml
index 46bd38a9..40e06835 100644
--- a/components/clp-py-utils/pyproject.toml
+++ b/components/clp-py-utils/pyproject.toml
@@ -28,6 +28,7 @@ clp = [
 ]
 dev = [
     "mypy>=1.18.2",
+    "pytest>=9.0.2",
     "ruff>=0.14.4",
 ]

Comment on lines 520 to +529
def get_uri(self):
return f"mongodb://{self.host}:{self.port}/{self.db_name}"
uri = f"mongodb://{self.host}:{self.port}/{self.db_name}"
params = []
if self.tls:
params.append("tls=true")
if self.tls_ca_file:
params.append(f"tlsCAFile={self.tls_ca_file}")
if params:
uri += "?" + "&".join(params)
return uri
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We might need to escape url parameters since tlsCAFile could contain spaces in the filename. urlencode can escape reserved url chars and join params with &.
  2. If you apply the suggestion below, you need to import quote: from urllib.parse import quote, urlencode
Suggested change
def get_uri(self):
return f"mongodb://{self.host}:{self.port}/{self.db_name}"
uri = f"mongodb://{self.host}:{self.port}/{self.db_name}"
params = []
if self.tls:
params.append("tls=true")
if self.tls_ca_file:
params.append(f"tlsCAFile={self.tls_ca_file}")
if params:
uri += "?" + "&".join(params)
return uri
def get_uri(self):
uri = f"mongodb://{self.host}:{self.port}/{self.db_name}"
query: dict[str, str] = {}
if self.tls:
query["tls"] = "true"
if self.tls_ca_file:
query["tlsCAFile"] = self.tls_ca_file
if query:
uri += "?" + urlencode(query, safe="", quote_via=quote)
return uri

If you agree with the suggestions above, we also need to update unit tests because urlencode escapes /

diff --git a/components/clp-py-utils/tests/test_results_cache.py b/components/clp-py-utils/tests/test_resul
ts_cache.py
index 64b586ef..262908e2 100644
--- a/components/clp-py-utils/tests/test_results_cache.py
+++ b/components/clp-py-utils/tests/test_results_cache.py
@@ -30,10 +30,26 @@ def test_get_uri_tls_with_ca_file():
     assert (
         cache.get_uri()
         == "mongodb://localhost:27017/clp-query-results"
-        "?tls=true&tlsCAFile=/etc/ssl/certs/global-bundle.pem"
+        "?tls=true&tlsCAFile=%2Fetc%2Fssl%2Fcerts%2Fglobal-bundle.pem"
     )
 
 
+def test_get_uri_tls_with_ca_file_url_encoded():
+    """Special characters in tls_ca_file are percent-encoded."""
+    cache = ResultsCache(tls=True, tls_ca_file="/etc/ssl/certs/ca & bundle.pem")
+    assert (
+        cache.get_uri()
+        == "mongodb://localhost:27017/clp-query-results"
+        "?tls=true&tlsCAFile=%2Fetc%2Fssl%2Fcerts%2Fca%20%26%20bundle.pem"
+    )
+
+
+def test_get_uri_special_chars_in_host_and_db():
+    """Special characters in host and db_name are percent-encoded."""
+    cache = ResultsCache(host="my host", db_name="my/db")
+    assert cache.get_uri() == "mongodb://my%20host:27017/my%2Fdb"
+
+

Comment on lines 6 to 19
export const autoConfig = () => {
let url = `mongodb://${settings.MongoDbHost}:${settings.MongoDbPort}/${settings.MongoDbName}?directConnection=true`;
const mongoDbTls = settings.MongoDbTls as boolean;
const mongoDbTlsCaFile = settings.MongoDbTlsCaFile as string | null;
if (mongoDbTls) {
url += "&tls=true";
if (mongoDbTlsCaFile) {
url += `&tlsCAFile=${mongoDbTlsCaFile}`;
}
}
return {
forceClose: true,
url: `mongodb://${settings.MongoDbHost}:${settings.MongoDbPort}/${settings.MongoDbName}?directConnection=true`,
url,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably also need to escape url parameters here. How about:

Suggested change
export const autoConfig = () => {
let url = `mongodb://${settings.MongoDbHost}:${settings.MongoDbPort}/${settings.MongoDbName}?directConnection=true`;
const mongoDbTls = settings.MongoDbTls as boolean;
const mongoDbTlsCaFile = settings.MongoDbTlsCaFile as string | null;
if (mongoDbTls) {
url += "&tls=true";
if (mongoDbTlsCaFile) {
url += `&tlsCAFile=${mongoDbTlsCaFile}`;
}
}
return {
forceClose: true,
url: `mongodb://${settings.MongoDbHost}:${settings.MongoDbPort}/${settings.MongoDbName}?directConnection=true`,
url,
};
export const autoConfig = () => {
const params: Record<string, string> = {directConnection: "true"};
if (settings.MongoDbTls) {
params["tls"] = "true";
if (settings.MongoDbTlsCaFile) {
params["tlsCAFile"] = settings.MongoDbTlsCaFile;
}
}
const authority = `${settings.MongoDbHost}:${settings.MongoDbPort}`;
const path = settings.MongoDbName;
const query = new URLSearchParams(params).toString();
const url = `mongodb://${authority}/${path}?${query}`;
return {
forceClose: true,
url: url,
};
};

Comment on lines +186 to +200
# Optional volume and mount to inject a TLS CA certificate into pods that connect to the
# results cache. When set, the volume is added to all pods that use the results cache
# connection (query-scheduler, query-worker, garbage-collector, reducer, webui, and
# results-cache-indices-creator).
# Example:
# tlsVolume:
# volume:
# name: results-cache-ca
# secret:
# secretName: my-docdb-ca-bundle
# mount:
# mountPath: /etc/ssl/certs/results-cache-ca.pem
# subPath: global-bundle.pem
# readOnly: true
tlsVolume: null
Copy link
Copy Markdown
Contributor

@hoophalab hoophalab May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One high level comment on changes to helm charts:

I’d suggest avoiding user-configurable volume mounts in values.yaml to keep things simple. Additionally, since a root CA isn't sensitive data, using a Kubernetes Secret feels like overkill.

How about adding the CA file to a ConfigMap and mounting it to a fixed location across all pods. This streamlines the TLS setup for users

helm install ... \
--set clpConfig.results_cache.tls=true
--set-file clpConfig.results_cache.tls_ca_cert_content=/path/to/ca.crt

See details in the prototype diff below:

diff --git a/tools/deployment/package-helm/templates/_helpers.tpl b/tools/deployment/package-helm/templates/_helpers.tpl
index db2d5a80..0a0e71fa 100644
--- a/tools/deployment/package-helm/templates/_helpers.tpl
+++ b/tools/deployment/package-helm/templates/_helpers.tpl
@@ -436,3 +436,59 @@ topologySpreadConstraints:
   {{- toYaml . | nindent 2 }}
 {{- end }}
 {{- end }}{{/* define "clp.createSchedulingConfigs" */}}
+
+{{/*
+Path inside the container where the results cache CA certificate is mounted.
+
+@return {string} The pod-local path for the results cache CA certificate
+*/}}
+{{- define "clp.resultsCacheTlsCaFile" -}}
+/etc/clp-results-cache-ca-cert/ca.crt
+{{- end }}
+
+{{/*
+Creates a volumeMount for the results cache TLS CA certificate.
+
+@return {string} YAML-formatted volumeMount definition
+*/}}
+{{- define "clp.resultsCacheTlsVolumeMount" -}}
+name: "results-cache-ca-cert"
+mountPath: {{ include "clp.resultsCacheTlsCaFile" . | dir | quote }}
+readOnly: true
+{{- end }}
+
+{{/*
+Creates a volume for the results cache TLS CA certificate, referencing the ConfigMap.
+
+@param {object} root Root template context
+@return {string} YAML-formatted volume definition
+*/}}
+{{- define "clp.resultsCacheTlsVolume" -}}
+name: "results-cache-ca-cert"
+configMap:
+  name: {{ include "clp.fullname" .root }}-config
+  items:
+    - key: "results-cache-ca.crt"
+      path: "ca.crt"
+{{- end }}
+
+{{/*
+Builds the full results cache MongoDB URI, including TLS query parameters when configured.
+
+@param {object} . Root template context
+@return {string} The MongoDB connection URI
+*/}}
+{{- define "clp.resultsCacheUri" -}}
+{{- $base := printf "mongodb://%s:%s/%s"
+    (include "clp.resultsCacheHost" .)
+    (include "clp.resultsCachePort" . | toString)
+    .Values.clpConfig.results_cache.db_name
+-}}
+{{- if and .Values.clpConfig.results_cache.tls .Values.clpConfig.results_cache.tls_ca_cert_content -}}
+{{- printf "%s?tls=true&tlsCAFile=%s" $base (include "clp.resultsCacheTlsCaFile" .) -}}
+{{- else if .Values.clpConfig.results_cache.tls -}}
+{{- printf "%s?tls=true" $base -}}
+{{- else -}}
+{{- $base -}}
+{{- end -}}
+{{- end }}
diff --git a/tools/deployment/package-helm/templates/api-server-deployment.yaml b/tools/deployment/package-helm/templates/api-server-deployment.yaml
index d36bb1e3..a4341b27 100644
--- a/tools/deployment/package-helm/templates/api-server-deployment.yaml
+++ b/tools/deployment/package-helm/templates/api-server-deployment.yaml
@@ -56,6 +56,9 @@ spec:
               mountPath: "/etc/clp-config.yaml"
               subPath: "clp-config.yaml"
               readOnly: true
+            {{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+            - {{- include "clp.resultsCacheTlsVolumeMount" . | nindent 14 }}
+            {{- end }}
             {{- if eq .Values.clpConfig.stream_output.storage.type "fs" }}
             - name: {{ include "clp.volumeName" (dict
                 "component_category" "shared-data"
@@ -81,6 +84,9 @@ spec:
         - name: "config"
           configMap:
             name: {{ include "clp.fullname" . }}-config
+        {{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+        - {{- include "clp.resultsCacheTlsVolume" (dict "root" .) | nindent 10 }}
+        {{- end }}
         {{- if eq .Values.clpConfig.stream_output.storage.type "fs" }}
         - {{- include "clp.pvcVolume" (dict
             "root" .
diff --git a/tools/deployment/package-helm/templates/configmap.yaml b/tools/deployment/package-helm/templates/configmap.yaml
index a66be78d..2c4c86ae 100644
--- a/tools/deployment/package-helm/templates/configmap.yaml
+++ b/tools/deployment/package-helm/templates/configmap.yaml
@@ -164,6 +164,10 @@ data:
       retention_period: null
       {{- end }}
       stream_collection_name: {{ .Values.clpConfig.results_cache.stream_collection_name | quote }}
+      tls: {{ .Values.clpConfig.results_cache.tls }}
+      {{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+      tls_ca_file: {{ include "clp.resultsCacheTlsCaFile" . | quote }}
+      {{- end }}
     {{- with .Values.clpConfig.stream_output }}
     stream_output:
       {{- with .storage }}
@@ -278,6 +282,10 @@ data:
       "MongoDbHost": "{{ include "clp.resultsCacheHost" . }}",
       "MongoDbPort": {{ include "clp.resultsCachePort" . | int }},
       "MongoDbName": {{ .Values.clpConfig.results_cache.db_name | quote }},
+      "MongoDbTls": {{ .Values.clpConfig.results_cache.tls }},
+{{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+      "MongoDbTlsCaFile": {{ include "clp.resultsCacheTlsCaFile" . | quote }},
+{{- end }}
       "MongoDbSearchResultsMetadataCollectionName":
         {{ .Values.clpConfig.webui.results_metadata_collection_name | quote }},
       "MongoDbStreamFilesCollectionName":
@@ -435,3 +443,7 @@ data:
     echo "node.id=$(hostname)" >> "${PRESTO_CONFIG_DIR}/node.properties"
 {{- end }}{{/* with .Values.clpConfig.presto */}}
 {{- end }}{{/* if has "presto" .Values.clpConfig.bundled */}}
+{{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+  results-cache-ca.crt: |
+{{ .Values.clpConfig.results_cache.tls_ca_cert_content | indent 4 }}
+{{- end }}
diff --git a/tools/deployment/package-helm/templates/garbage-collector-deployment.yaml b/tools/deployment/package-helm/templates/garbage-collector-deployment.yaml
index 1b202b15..a93123e2 100644
--- a/tools/deployment/package-helm/templates/garbage-collector-deployment.yaml
+++ b/tools/deployment/package-helm/templates/garbage-collector-deployment.yaml
@@ -74,6 +74,9 @@ spec:
             {{- if .Values.clpConfig.aws_config_directory }}
             - {{- include "clp.awsConfigVolumeMount" . | nindent 14 }}
             {{- end }}
+            {{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+            - {{- include "clp.resultsCacheTlsVolumeMount" . | nindent 14 }}
+            {{- end }}
             {{- if eq .Values.clpConfig.stream_output.storage.type "fs" }}
             - name: {{ include "clp.volumeName" (dict
                 "component_category" "shared-data"
@@ -105,6 +108,9 @@ spec:
         {{- if .Values.clpConfig.aws_config_directory }}
         - {{- include "clp.awsConfigVolume" . | nindent 10 }}
         {{- end }}
+        {{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+        - {{- include "clp.resultsCacheTlsVolume" (dict "root" .) | nindent 10 }}
+        {{- end }}
         {{- if eq .Values.clpConfig.stream_output.storage.type "fs" }}
         - {{- include "clp.pvcVolume" (dict
             "root" .
diff --git a/tools/deployment/package-helm/templates/query-scheduler-deployment.yaml b/tools/deployment/package-helm/templates/query-scheduler-deployment.yaml
index a61816f3..0145e3e7 100644
--- a/tools/deployment/package-helm/templates/query-scheduler-deployment.yaml
+++ b/tools/deployment/package-helm/templates/query-scheduler-deployment.yaml
@@ -74,6 +74,9 @@ spec:
               mountPath: "/etc/clp-config.yaml"
               subPath: "clp-config.yaml"
               readOnly: true
+            {{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+            - {{- include "clp.resultsCacheTlsVolumeMount" . | nindent 14 }}
+            {{- end }}
           command: [
             "python3", "-u",
             "-m", "job_orchestration.scheduler.query.query_scheduler",
@@ -90,4 +93,7 @@ spec:
         - name: "config"
           configMap:
             name: {{ include "clp.fullname" . }}-config
+        {{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+        - {{- include "clp.resultsCacheTlsVolume" (dict "root" .) | nindent 10 }}
+        {{- end }}
 {{- end }}
diff --git a/tools/deployment/package-helm/templates/query-worker-deployment.yaml b/tools/deployment/package-helm/templates/query-worker-deployment.yaml
index 6e6a67b1..e0a3f9b2 100644
--- a/tools/deployment/package-helm/templates/query-worker-deployment.yaml
+++ b/tools/deployment/package-helm/templates/query-worker-deployment.yaml
@@ -66,6 +66,9 @@ spec:
             {{- if .Values.clpConfig.aws_config_directory }}
             - {{- include "clp.awsConfigVolumeMount" . | nindent 14 }}
             {{- end }}
+            {{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+            - {{- include "clp.resultsCacheTlsVolumeMount" . | nindent 14 }}
+            {{- end }}
             {{- if eq .Values.clpConfig.stream_output.storage.type "fs" }}
             - name: {{ include "clp.volumeName" (dict
                 "component_category" "shared-data"
@@ -108,6 +111,9 @@ spec:
         {{- if .Values.clpConfig.aws_config_directory }}
         - {{- include "clp.awsConfigVolume" . | nindent 10 }}
         {{- end }}
+        {{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+        - {{- include "clp.resultsCacheTlsVolume" (dict "root" .) | nindent 10 }}
+        {{- end }}
         {{- if eq .Values.clpConfig.stream_output.storage.type "fs" }}
         - {{- include "clp.pvcVolume" (dict
             "root" .
diff --git a/tools/deployment/package-helm/templates/reducer-deployment.yaml b/tools/deployment/package-helm/templates/reducer-deployment.yaml
index 1b33d03c..e6f10988 100644
--- a/tools/deployment/package-helm/templates/reducer-deployment.yaml
+++ b/tools/deployment/package-helm/templates/reducer-deployment.yaml
@@ -53,6 +53,9 @@ spec:
               mountPath: "/etc/clp-config.yaml"
               subPath: "clp-config.yaml"
               readOnly: true
+            {{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+            - {{- include "clp.resultsCacheTlsVolumeMount" . | nindent 14 }}
+            {{- end }}
           command: [
             "python3", "-u",
             "-m", "job_orchestration.reducer.reducer",
@@ -64,4 +67,7 @@ spec:
         - name: "config"
           configMap:
             name: {{ include "clp.fullname" . }}-config
+        {{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+        - {{- include "clp.resultsCacheTlsVolume" (dict "root" .) | nindent 10 }}
+        {{- end }}
 {{- end }}
diff --git a/tools/deployment/package-helm/templates/results-cache-indices-creator-job.yaml b/tools/deployment/package-helm/templates/results-cache-indices-creator-job.yaml
index c2074bf0..3a28a1eb 100644
--- a/tools/deployment/package-helm/templates/results-cache-indices-creator-job.yaml
+++ b/tools/deployment/package-helm/templates/results-cache-indices-creator-job.yaml
@@ -33,9 +33,15 @@ spec:
             "python3", "-u",
             "-m", "clp_py_utils.initialize-results-cache",
             "--uri",
-            "mongodb://{{ include "clp.resultsCacheHost" . }}:{{
-              include "clp.resultsCachePort" . | int
-            }}/{{ .Values.clpConfig.results_cache.db_name }}",
+            "{{ include "clp.resultsCacheUri" . }}",
             "--stream-collection",
             {{ .Values.clpConfig.results_cache.stream_collection_name | quote }}
           ]
+          {{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+          volumeMounts:
+            - {{- include "clp.resultsCacheTlsVolumeMount" . | nindent 14 }}
+          {{- end }}
+      {{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+      volumes:
+        - {{- include "clp.resultsCacheTlsVolume" (dict "root" .) | nindent 10 }}
+      {{- end }}
diff --git a/tools/deployment/package-helm/templates/webui-deployment.yaml b/tools/deployment/package-helm/templates/webui-deployment.yaml
index 71edbc6e..3c7081bc 100644
--- a/tools/deployment/package-helm/templates/webui-deployment.yaml
+++ b/tools/deployment/package-helm/templates/webui-deployment.yaml
@@ -79,6 +79,9 @@ spec:
             {{- if .Values.clpConfig.aws_config_directory }}
             - {{- include "clp.awsConfigVolumeMount" . | nindent 14 }}
             {{- end }}
+            {{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+            - {{- include "clp.resultsCacheTlsVolumeMount" . | nindent 14 }}
+            {{- end }}
             {{- if eq .Values.clpConfig.logs_input.type "fs" }}
             - {{- include "clp.logsInputVolumeMount" . | nindent 14 }}
             {{- end }}
@@ -111,6 +114,9 @@ spec:
         {{- if .Values.clpConfig.aws_config_directory }}
         - {{- include "clp.awsConfigVolume" . | nindent 10 }}
         {{- end }}
+        {{- if .Values.clpConfig.results_cache.tls_ca_cert_content }}
+        - {{- include "clp.resultsCacheTlsVolume" (dict "root" .) | nindent 10 }}
+        {{- end }}
         {{- if eq .Values.clpConfig.logs_input.type "fs" }}
         - {{- include "clp.logsInputVolume" . | nindent 10 }}
         {{- end }}
diff --git a/tools/deployment/package-helm/values.yaml b/tools/deployment/package-helm/values.yaml
index ff4f0ba6..d4a24879 100644
--- a/tools/deployment/package-helm/values.yaml
+++ b/tools/deployment/package-helm/values.yaml
@@ -175,6 +175,15 @@ clpConfig:
     # Retention period for search results, in minutes. Set to null to disable automatic deletion.
     retention_period: 60
 
+    # Whether to use TLS for the results cache (MongoDB) connection.
+    tls: false
+
+    # Content of the results cache (MongoDB) CA certificate file.
+    # Provide using --set-file when installing, e.g.:
+    #   helm install ... --set-file clpConfig.results_cache.tls_ca_cert_content=/path/to/ca.crt
+    # Only used when tls is true.
+    tls_ca_cert_content: ""
+
   compression_worker:
     logging_level: "INFO"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious - does this PR enable CLP to run on DocumentDB?
By default, results-cache-indices-creator, webui, and reducer all use features that aren't supported by DocumentDB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants