Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
57be8f7
feat(helm): Add Presto query engine support to the Helm chart.
junhaoliao Feb 17, 2026
96e9a45
docs: Remove redundant worker configuration comment in Presto guide
junhaoliao Feb 17, 2026
8089254
chore(helm): Bump chart version to 0.1.4-dev.4
junhaoliao Feb 17, 2026
d389542
Merge branch 'main' into k8s-presto
hoophalab Feb 18, 2026
00d7292
feat(helm): Improve modularity and add Presto support refinements
junhaoliao Feb 18, 2026
ef19fba
Merge branch 'main' into k8s-presto
junhaoliao Mar 10, 2026
9c83eb3
chore(helm): Bump chart version to 0.2.1-dev.1
junhaoliao Mar 10, 2026
5fc55fc
docs: Clarify Helm configuration for enabling Presto and resource opt…
junhaoliao Mar 10, 2026
6611a46
docs: Remove redundant comments on Presto configuration in K8s guide
junhaoliao Mar 10, 2026
4ac6a71
chore(helm): Remove extraneous newline in `values.yaml`
junhaoliao Mar 10, 2026
ef6be81
Merge branch 'main' into k8s-presto
junhaoliao Mar 12, 2026
3ea1670
docs(helm): Clarify Presto SQL configuration in `values.yaml`
junhaoliao Mar 12, 2026
80204d7
chore(helm): Bump chart version to 0.2.1-dev.2
junhaoliao Mar 12, 2026
e0d175a
Merge remote-tracking branch 'origin/main' into k8s-presto
junhaoliao Mar 12, 2026
dda97e2
refactor(helm/docs): Rename `package.query_engine` to `webui.query_en…
junhaoliao Mar 12, 2026
5525e7e
refactor(helm): Use dynamic host and port for CLP metadata DB in Pres…
junhaoliao Mar 16, 2026
55d32d4
refactor(helm): Rename service account for Presto deployments to `ser…
junhaoliao Mar 16, 2026
72160c1
docs: Simplify Presto setup instructions in K8s deployment guide
junhaoliao Mar 16, 2026
4337cd1
docs: Remove dead link to logging infrastructure issue in K8s deploym…
junhaoliao Mar 16, 2026
efb640b
refactor(helm): Replace wget with curl for fetching Presto version in…
junhaoliao Mar 16, 2026
9df3643
refactor(helm): Update Presto version extraction to use `nodeVersion.…
junhaoliao Mar 16, 2026
c8fd988
docs: Fix incorrect property name in Presto setup instructions for K8…
junhaoliao Mar 16, 2026
1939ce1
refactor(helm): Set default Presto replicas to 0 in multi-shared and …
junhaoliao Mar 16, 2026
6688324
Add support for enabling Presto in test setups with configurable Helm…
junhaoliao Mar 16, 2026
f6ff83a
fix(deployment): Pre-create shared-data PVs in `set-up-multi-dedicate…
junhaoliao Mar 16, 2026
eec6709
chore(helm): bump chart version to 0.2.1-dev.2
junhaoliao Mar 16, 2026
48aa730
fix(deployment): correct heredoc identifier in `set-up-multi-dedicate…
junhaoliao Mar 16, 2026
f6d3bc0
update PV configuration in `set-up-multi-dedicated-test.sh` (remove r…
junhaoliao Mar 17, 2026
75626e0
Merge remote-tracking branch 'junhao/fix-helm-pv' into k8s-presto
junhaoliao Mar 17, 2026
ffab28b
fix(deployment): pre-create shared-data directories on nodes in `set-…
junhaoliao Mar 17, 2026
fdac633
fix(deployment): update shared-data directory creation to use `CLP_HO…
junhaoliao Mar 17, 2026
2df432c
Merge remote-tracking branch 'junhao/fix-helm-pv' into k8s-presto
junhaoliao Mar 17, 2026
304e97d
Merge branch 'main' into k8s-presto
junhaoliao Mar 18, 2026
96912c3
bump chart version
junhaoliao Mar 18, 2026
578fb0d
fix(helm): update Presto configurations and add S3 storage support in…
junhaoliao Mar 18, 2026
e144e69
fix(helm): increase liveness probe initial delay to 180s in `_helpers…
junhaoliao Mar 18, 2026
333941c
remove cert hack
junhaoliao Mar 18, 2026
8c5f6d8
fix(deployment): strip whitespace in JSON heredoc for `split_filter` …
junhaoliao Mar 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions docs/src/user-docs/guides-k8s-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -298,11 +298,82 @@ helm template clp . -f custom-values.yaml

::::

### Using Presto as the query engine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section duplicates with guides-using-presto.md. Shall we use a reference link?


To use [Presto][presto-guide] as the query engine instead of the default clp-s query pipeline, set
`query_engine` to `"presto"` and configure the Presto-specific settings:

```{code-block} yaml
:caption: presto-values.yaml

image:
prestoCoordinator:
repository: "ghcr.io/y-scope/presto/coordinator"
tag: "dev"
prestoWorker:
repository: "ghcr.io/y-scope/presto/prestissimo-worker"
tag: "dev"

prestoWorker:
# See below "Worker scheduling" for more details on configuring Presto scheduling
replicas: 2

clpConfig:
package:
storage_engine: "clp-s"
query_engine: "presto"

# Disable results cache retention since the Presto integration doesn't yet support garbage
# collection of search results.
results_cache:
retention_period: null

presto:
coordinator:
logging_level: "INFO"
query_max_memory_gb: 1
query_max_memory_per_node_gb: 1
worker:
query_memory_gb: 4
system_memory_gb: 8
# Split filter config for the Presto CLP connector. For each dataset you want to query, add a
# filter entry. Replace <dataset> with the dataset name (use "default" if you didn't specify one
# when compressing) and <timestamp-key> with the timestamp key used during compression.
# See https://docs.yscope.com/presto/connector/clp.html#split-filter-config-file
split_filter:
clp.default.<dataset>:
- columnName: "<timestamp-key>"
customOptions:
rangeMapping:
lowerBound: "begin_timestamp"
upperBound: "end_timestamp"
required: false
```

Install with the Presto values:

```bash
helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml
```

:::{note}
When `query_engine` is set to `"presto"`, the chart deploys a Presto coordinator and Presto
worker(s) instead of the query scheduler, query workers, reducers, and results cache.
:::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for conditional logic around results_cache, api-server, and other components
# in the Helm templates to determine what's excluded when query_engine="presto"

echo "=== Searching for results_cache conditional logic ==="
rg -n 'results.cache|resultsCache' tools/deployment/package-helm/templates/ \
  -A 5 -B 5 --type yaml

echo -e "\n=== Searching for api-server conditional logic ==="
rg -n 'api.server|apiServer|api_server' tools/deployment/package-helm/templates/ \
  -A 5 -B 5 --type yaml

echo -e "\n=== Searching for query_engine conditional logic ==="
rg -n 'query_engine|queryEngine' tools/deployment/package-helm/templates/ \
  -A 5 -B 5 --type yaml

Repository: y-scope/clp

Length of output: 50367


Note is inaccurate: results_cache should not be listed as excluded, and api-server is missing.

The note states Presto replaces "the query scheduler, query workers, reducers, and results cache", but:

  1. The Helm templates show no conditional gating on query_engine for any results-cache-* manifests (service, statefulset, indices-creator job), confirming it remains deployed when Presto is enabled — only with retention disabled.
  2. The api-server deployment is actually gated by {{- if and .Values.clpConfig.api_server (ne .Values.clpConfig.package.query_engine "presto") }}, so it is excluded and should be mentioned.
  3. The query scheduler, query workers, and reducers are all gated by {{- if ne .Values.clpConfig.package.query_engine "presto" }}, confirming they are excluded.

Update the note to list: query scheduler, query workers, reducers, and api-server—not results cache.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/user-docs/guides-k8s-deployment.md` around lines 359 - 362, Update
the note text in the docs to correct the list of components replaced when
`query_engine` = "presto": remove "results cache" from the excluded list and add
"api-server"; reference the Helm gating logic when editing the sentence so it
reads that Presto replaces the "query scheduler, query workers, reducers, and
api-server" (since `query_scheduler`, `query_workers`, and `reducers` are gated
by `{{- if ne .Values.clpConfig.package.query_engine \"presto\" }}` and the
`api-server` deployment is gated by `{{- if and .Values.clpConfig.api_server (ne
.Values.clpConfig.package.query_engine \"presto\") }}`), while `results-cache-*`
manifests are not conditionally removed.


For more details on querying logs through Presto, see the [Using Presto][presto-guide] guide.

### Worker scheduling

You can control where workers are scheduled using standard Kubernetes scheduling primitives
(`nodeSelector`, `affinity`, `tolerations`, `topologySpreadConstraints`).

:::{note}
When using Presto as the query engine, use `prestoWorker:` instead of `queryWorker:` and `reducer:`
to configure Presto worker scheduling. The `prestoWorker:` key supports the same `scheduling:`
options.
:::

#### Dedicated node pools

To run compression workers, query workers, and reducers in separate node pools:
Expand Down Expand Up @@ -588,6 +659,7 @@ To tear down a `kubeadm` cluster:
* [External database setup][external-db-guide]: Using external MariaDB and MongoDB
* [Using object storage][s3-storage]: Configuring S3 storage
* [Configuring retention periods][retention-guide]: Setting up data retention policies
* [Using Presto][presto-guide]: Distributed SQL queries on compressed logs

[aks]: https://azure.microsoft.com/en-us/products/kubernetes-service
[api-server]: guides-using-the-api-server.md
Expand All @@ -605,6 +677,7 @@ To tear down a `kubeadm` cluster:
[kubeadm]: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
[kubectl]: https://kubernetes.io/docs/tasks/tools/
[logging-infra-issue]: https://github.com/y-scope/clp/issues/1760
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used

[presto-guide]: guides-using-presto.md
[quick-start]: quick-start/index.md
[retention-guide]: guides-retention.md
[rfc-1918]: https://datatracker.ietf.org/doc/html/rfc1918#section-3
Expand Down
80 changes: 77 additions & 3 deletions docs/src/user-docs/guides-using-presto.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,89 @@ maintained in a [fork][yscope-presto] of the Presto project. At some point, thes
been merged into the main Presto repository so that you can use official Presto releases with CLP.
:::

## Requirements
## Deployment options

CLP supports Presto through two deployment methods:

* **[Kubernetes (Helm)](#kubernetes-helm)**: Presto is deployed as part of the CLP Helm chart. This
is the simplest option if you are already using the [Kubernetes deployment][k8s-deployment].
* **[Docker Compose](#docker-compose)**: Presto is deployed separately using Docker Compose alongside
a CLP package installation.

## Kubernetes (Helm)

When deploying CLP on Kubernetes using Helm, Presto can be enabled by setting the `query_engine` to
`"presto"` in your Helm values.

### Requirements

* A running CLP Kubernetes deployment (see the [Kubernetes deployment guide][k8s-deployment])

### Set up

1. Create a values file to enable Presto:

```{code-block} yaml
:caption: presto-values.yaml

clpConfig:
package:
query_engine: "presto"

# Disable results cache retention since the Presto integration doesn't yet support
# garbage collection of search results.
results_cache:
retention_period: null

presto:
# Split filter config for the Presto CLP connector. For each dataset, add a filter entry.
# Replace <dataset> with the dataset name (use "default" if you didn't specify one when
# compressing) and <timestamp-key> with the timestamp key used during compression.
# See https://docs.yscope.com/presto/connector/clp.html#split-filter-config-file
split_filter:
clp.default.<dataset>:
- columnName: "<timestamp-key>"
customOptions:
rangeMapping:
lowerBound: "begin_timestamp"
upperBound: "end_timestamp"
required: false
```

2. Install (or upgrade) the Helm chart with the Presto values:

```bash
helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml
```
Comment on lines +83 to +87
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

helm install will fail for users with an existing CLP deployment.

The requirement at line 33 explicitly states the user already has "a running CLP Kubernetes deployment," yet step 2 only shows helm install, which would error out with release already exists. The parenthetical "(or upgrade)" acknowledges the upgrade path but never shows the corresponding command.

💡 Suggested fix
-2. Install (or upgrade) the Helm chart with the Presto values:
+2. If you haven't installed CLP yet, install it with Presto enabled:
 
    ```bash
    helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml
    ```
+
+   If CLP is already installed, upgrade the release instead:
+
+   ```bash
+   helm upgrade clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml
+   ```
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
2. Install (or upgrade) the Helm chart with the Presto values:
```bash
helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml
```
2. If you haven't installed CLP yet, install it with Presto enabled:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/user-docs/guides-using-presto.md` around lines 66 - 70, The docs
show only a helm install command which will fail when CLP is already deployed;
update the step under the Presto install to include the alternative helm upgrade
invocation so users can perform an upgrade when a release exists—specifically
add guidance to run `helm upgrade clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f
presto-values.yaml` as the alternative to `helm install` (referencing the
existing `helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f
presto-values.yaml` and the `presto-values.yaml` values file).


3. Verify that the Presto coordinator and worker pods are running:

```bash
kubectl get pods -l "app.kubernetes.io/component in (presto-coordinator, presto-worker)"
```

Once the pods are ready, you can [query your logs through Presto](#querying-your-logs-through-presto)
using CLP's Web UI.

:::{note}
When using Kubernetes, Presto worker scheduling can be configured using the `prestoWorker.scheduling`
key in Helm values. See the [worker scheduling][k8s-scheduling] section of the Kubernetes deployment
guide for details.
:::

## Docker Compose

### Requirements

* [CLP][clp-releases] (clp-json) v0.5.0 or higher
* [Docker] v28 or higher
* [Docker Compose][docker-compose] v2.20.2 or higher
* Python
* python3-venv (for the version of Python installed)

## Set up
### Set up

Using Presto with CLP requires:
Using Presto with CLP via Docker Compose requires:

* [Setting up CLP](#setting-up-clp) and compressing some logs.
* [Setting up Presto](#setting-up-presto) to query CLP's metadata database and archives.
Expand Down Expand Up @@ -227,6 +299,8 @@ These limitations will be addressed in a future release of the Presto integratio
[clp-releases]: https://github.com/y-scope/clp/releases
[docker-compose]: https://docs.docker.com/compose/install/
[Docker]: https://docs.docker.com/engine/install/
[k8s-deployment]: guides-k8s-deployment.md
[k8s-scheduling]: guides-k8s-deployment.md#worker-scheduling
[postgresql]: https://zenodo.org/records/10516401
[Presto]: https://prestodb.io/
[y-scope/presto#8]: https://github.com/y-scope/presto/issues/8
Expand Down
2 changes: 1 addition & 1 deletion tools/deployment/package-helm/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
apiVersion: "v2"
name: "clp"
version: "0.1.4-dev.2"
version: "0.1.4-dev.4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check whether 0.1.4-dev.3 is used in any branch or tag
git grep "0.1.4-dev.3" $(git for-each-ref --format='%(refname)' refs/heads refs/remotes refs/tags) -- "tools/deployment/package-helm/Chart.yaml" 2>/dev/null || echo "Not found on any ref"

Repository: y-scope/clp

Length of output: 75


Change version to 0.1.4-dev.3 to maintain sequential pre-release versioning.

The version skips from 0.1.4-dev.2 directly to 0.1.4-dev.4, but 0.1.4-dev.3 does not exist on any branch or tag in the repository. This skip appears unintentional and breaks the contiguous pre-release sequence.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/deployment/package-helm/Chart.yaml` at line 3, The Chart.yaml
pre-release version was mistakenly advanced to "0.1.4-dev.4", skipping
"0.1.4-dev.3"; update the version field in Chart.yaml from "0.1.4-dev.4" to
"0.1.4-dev.3" so the pre-release sequence remains contiguous (look for the
existing version: "0.1.4-dev.4" and replace it with "0.1.4-dev.3").

description: "A Helm chart for CLP's (Compressed Log Processor) package deployment"
type: "application"
appVersion: "0.9.1-dev"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if .Values.clpConfig.api_server }}
{{- if and .Values.clpConfig.api_server (ne .Values.clpConfig.package.query_engine "presto") }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we disable api server? Shouldn't we also disable query scheduler and worker if clp search is not needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, the API server does not work with Presto.

we do have this specified in tools/deployment/package-helm/templates/query-scheduler-deployment.yaml as well

{{- if ne .Values.clpConfig.package.query_engine "presto" }}

though, given some users may want to use both the API Server the Presto, i think we may discard .Values.clpConfig.package.query_engine values and simply check whether query_scheduler / query_worker / API is null before launching the services. what do you think?

cc @kirkrodrigues for discussion since this is interface related

apiVersion: "apps/v1"
kind: "Deployment"
metadata:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if .Values.clpConfig.api_server }}
{{- if and .Values.clpConfig.api_server (ne .Values.clpConfig.package.query_engine "presto") }}
{{- include "clp.createStaticPv" (dict
"root" .
"component_category" "api-server"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if .Values.clpConfig.api_server }}
{{- if and .Values.clpConfig.api_server (ne .Values.clpConfig.package.query_engine "presto") }}
{{- include "clp.createPvc" (dict
"root" .
"component_category" "api-server"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{{- if .Values.clpConfig.api_server }}
{{- if and .Values.clpConfig.api_server (ne .Values.clpConfig.package.query_engine "presto") }}
apiVersion: "v1"
kind: "Service"
metadata:
Expand Down
92 changes: 92 additions & 0 deletions tools/deployment/package-helm/templates/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,13 @@ data:
{{- else }}
mcp_server: null
{{- end }}
{{- if eq .Values.clpConfig.package.query_engine "presto" }}
presto:
host: "{{ include "clp.fullname" . }}-presto-coordinator"
port: 8080
{{- else }}
presto: null
{{- end }}

mysql-logging.cnf: |
[mysqld]
Expand Down Expand Up @@ -321,6 +328,91 @@ data:
{{ .Values.clpConfig.archive_output.target_segment_size | int }},
"ClpQueryEngine": {{ .Values.clpConfig.package.query_engine | quote }},
"ClpStorageEngine": {{ .Values.clpConfig.package.storage_engine | quote }},
{{- if eq .Values.clpConfig.package.query_engine "presto" }}
"PrestoHost": "{{ include "clp.fullname" . }}-presto-coordinator",
"PrestoPort": 8080
{{- else }}
"PrestoHost": null,
"PrestoPort": null
{{- end }}
}

{{- if eq .Values.clpConfig.package.query_engine "presto" }}
{{- with .Values.clpConfig.presto }}
presto-coordinator-catalog-clp.properties: |
connector.name=clp
clp.metadata-provider-type=mysql
clp.metadata-db-url=jdbc:mysql://{{ include "clp.fullname" $ }}-database:3306
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use {{ include "clp.databaseHost" . }} to support un-bundled database?

clp.metadata-db-name={{ $.Values.clpConfig.database.names.clp }}
clp.metadata-db-user={{ $.Values.credentials.database.username }}
clp.metadata-db-password={{ $.Values.credentials.database.password }}
Comment on lines +343 to +344
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Credentials stored in plaintext in a ConfigMap — use a Secret instead.

clp.metadata-db-password (and clp.metadata-db-user) are written directly into the ConfigMap. ConfigMaps are stored unencrypted in etcd by default, are readable by any principal with get/list on ConfigMaps in the namespace, and are fully visible in kubectl get configmap -o yaml. Kubernetes Secrets are the appropriate resource for sensitive values: they can be encrypted at rest, carry separate RBAC controls, and are omitted from audit logs by default.

The same concern applies to the S3 credentials at lines 396/398 (clp.s3-access-key-id, clp.s3-secret-access-key).

Consider rendering these files from a Kubernetes Secret (or multiple projected volume mounts) rather than embedding credentials in the ConfigMap.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/deployment/package-helm/templates/configmap.yaml` around lines 347 -
348, The ConfigMap is embedding sensitive values (clp.metadata-db-user,
clp.metadata-db-password, clp.s3-access-key-id, clp.s3-secret-access-key);
remove these keys from templates/configmap.yaml and instead create a Helm Secret
template that stores those values (e.g., templates/secret.yaml with
.Values.credentials.database and .Values.credentials.s3 entries), then update
consumers to read them from the Secret (use env valueFrom.secretKeyRef, secret
projection into a volume, or template lookup/include to inject non-plain text
values) so credentials are stored as Kubernetes Secrets rather than literal
ConfigMap entries.

clp.metadata-table-prefix=clp_
clp.split-provider-type=mysql
clp.split-filter-provider-type=mysql
clp.split-filter-config=/opt/presto-server/etc/split-filter.json

presto-coordinator-config-config.properties: |
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory={{ .coordinator.query_max_memory_gb }}GB
query.max-memory-per-node={{ .coordinator.query_max_memory_per_node_gb }}GB
discovery-server.enabled=true
discovery.uri=http://{{ include "clp.fullname" $ }}-presto-coordinator:8080
optimizer.optimize-hash-generation=false
regex-library=RE2J
use-alternative-function-signatures=true
inline-sql-functions=false
nested-data-serialization-enabled=false
native-execution-enabled=true

presto-coordinator-config-jvm.config: |
-server
-Xmx4G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true
Comment on lines +365 to +374
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Hardcoded -Xmx4G should be a configurable value.

The coordinator JVM heap is fixed at 4 GB regardless of the resource limits/requests set on the coordinator pod. If the pod is scheduled with less memory, it will OOM-kill; if scheduled with more, the heap cannot grow to use it. Expose this as a value (e.g., .Values.clpConfig.presto.coordinator.jvm_max_heap) so operators can tune it alongside query.max-memory and query.max-memory-per-node.

♻️ Suggested approach
  presto-coordinator-config-jvm.config: |
    -server
-   -Xmx4G
+   -Xmx{{ .coordinator.jvm_max_heap_gb }}G
    -XX:+UseG1GC
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/deployment/package-helm/templates/configmap.yaml` around lines 369 -
378, The JVM Xmx in presto-coordinator-config-jvm.config is hardcoded as
"-Xmx4G"; change it to render a Helm value so operators can configure it:
replace the literal "-Xmx4G" with a template that reads
.Values.clpConfig.presto.coordinator.jvm_max_heap (with a sensible default like
"4G"), ensure consumers pass the unit (e.g., "4G") or document it, and add the
new key plus a descriptive default and comment to values.yaml so the chart is
configurable and documented.


presto-coordinator-config-log.properties: |
com.facebook.presto={{ .coordinator.logging_level }}

presto-coordinator-config-node.properties: |
node.environment=production
node.id=presto-coordinator

presto-coordinator-config-split-filter.json: |
{{ .split_filter | toJson }}

presto-worker-catalog-clp.properties: |
connector.name=clp
{{- with $.Values.clpConfig.archive_output.storage }}
{{- if eq .type "s3" }}
clp.storage-type=s3
clp.s3-auth-provider=clp_package
clp.s3-access-key-id={{ .s3_config.aws_authentication.credentials.access_key_id }}
clp.s3-end-point=https://{{ .s3_config.bucket }}.s3.{{ .s3_config.region_code }}.amazonaws.com/
clp.s3-secret-access-key={{ .s3_config.aws_authentication.credentials.secret_access_key }}
{{- end }}{{/* if eq .type "s3" */}}
Comment on lines +388 to +400
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail fast when Presto S3 is given a non-static auth config.

This block always dereferences aws_authentication.credentials.access_key_id/secret_access_key, but the same chart still allows other auth shapes elsewhere (type: profile, optional session_token). In Presto mode those inputs will either render broken connector config or silently drop required auth data. Either reject non-static auth here with fail, or plumb the supported auth mode explicitly into the connector properties.

Possible guard
   {{- with $.Values.clpConfig.archive_output.storage }}
   {{- if eq .type "s3" }}
+  {{- $auth := .s3_config.aws_authentication | default dict }}
+  {{- if not $auth.credentials }}
+  {{- fail "Presto S3 mode currently requires archive_output.storage.s3_config.aws_authentication.credentials" }}
+  {{- end }}
     clp.storage-type=s3
     clp.s3-auth-provider=clp_package
-    clp.s3-access-key-id={{ .s3_config.aws_authentication.credentials.access_key_id }}
+    clp.s3-access-key-id={{ $auth.credentials.access_key_id }}
     clp.s3-end-point=https://{{ .s3_config.bucket }}.s3.{{ .s3_config.region_code }}.amazonaws.com/
-    clp.s3-secret-access-key={{ .s3_config.aws_authentication.credentials.secret_access_key }}
+    clp.s3-secret-access-key={{ $auth.credentials.secret_access_key }}
   {{- end }}{{/* if eq .type "s3" */}}
   {{- end }}{{/* with $.Values.clpConfig.archive_output.storage */}}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/deployment/package-helm/templates/configmap.yaml` around lines 388 -
395, The S3 block under the with $.Values.clpConfig.archive_output.storage
blindly dereferences
.s3_config.aws_authentication.credentials.{access_key_id,secret_access_key}
which breaks for non-static auth shapes; update the template around the eq .type
"s3" branch to validate that .s3_config.aws_authentication is the
static/credentials form (e.g. credentials exist and no session_token/profile
type) and call the Helm fail function with a clear message if it is not, or
alternatively only render the clp.s3-access-key-id/clp.s3-secret-access-key and
clp.s3-auth-provider=clp_package properties when the credentials fields are
present; reference the $.Values.clpConfig.archive_output.storage /
.s3_config.aws_authentication and the clp.s3-access-key-id /
clp.s3-secret-access-key symbols when making the change.

{{- end }}{{/* with $.Values.clpConfig.archive_output.storage */}}

presto-worker-config-config.properties: |
discovery.uri=http://{{ include "clp.fullname" $ }}-presto-coordinator:8080
http-server.http.port=8080
query-memory-gb={{ .worker.query_memory_gb }}
shutdown-onset-sec=1
system-memory-gb={{ .worker.system_memory_gb }}
register-test-functions=false
runtime-metrics-collection-enabled=false

presto-worker-config-node.properties: |
node.environment=production
node.location=worker-location

presto-worker-config-velox.properties: |
mutable-config=true
{{- end }}{{/* with .Values.clpConfig.presto */}}
{{- end }}{{/* if eq .Values.clpConfig.package.query_engine "presto" */}}
Loading
Loading