Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 107 additions & 6 deletions docs/src/user-docs/guides-k8s-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,8 @@ clpConfig:
# Use clp-text, instead of clp-json (default)
package:
storage_engine: "clp" # Use "clp-s" for clp-json, "clp" for clp-text

webui:
query_engine: "clp" # Use "clp-s" for clp-json, "clp" for clp-text, "presto" for Presto

# Configure archive output
Expand Down Expand Up @@ -246,11 +248,92 @@ helm template clp . -f custom-values.yaml

::::

### Using Presto as the query engine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section duplicates with guides-using-presto.md. Shall we use a reference link?


To use [Presto][presto-guide] as the query engine, set `webui.query_engine` to `"presto"` and
configure the Presto-specific settings. The `query_engine` setting controls which search interface
the Web UI displays. Presto runs alongside the existing compression pipeline; setting the clp-s
native query components to `null` is optional but recommended to save resources when you don't need
both query paths:

```{code-block} yaml
:caption: presto-values.yaml

image:
prestoCoordinator:
repository: "ghcr.io/y-scope/presto/coordinator"
tag: "clp-v0.10.0"
prestoWorker:
repository: "ghcr.io/y-scope/presto/prestissimo-worker"
tag: "clp-v0.10.0"

prestoWorker:
# See below "Worker scheduling" for more details on configuring Presto scheduling
replicas: 2

clpConfig:
webui:
query_engine: "presto"

# Optional: Disable the clp-s native query pipeline to save resources.
# NOTE: The API server depends on the clp-s native query pipeline.
api_server: null
query_scheduler: null
query_worker: null
reducer: null

# Disable results cache retention since the Presto integration doesn't yet support garbage
# collection of search results.
results_cache:
retention_period: null

presto:
port: 30889
coordinator:
logging_level: "INFO"
query_max_memory_gb: 1
query_max_memory_per_node_gb: 1
worker:
query_memory_gb: 4
system_memory_gb: 8
# Split filter config for the Presto CLP connector. For each dataset you want to query, add a
# filter entry. Replace <dataset> with the dataset name (use "default" if you didn't specify one
# when compressing) and <timestamp-key> with the timestamp key used during compression.
# See https://docs.yscope.com/presto/connector/clp.html#split-filter-config-file
split_filter:
clp.default.<dataset>:
- columnName: "<timestamp-key>"
customOptions:
rangeMapping:
lowerBound: "begin_timestamp"
upperBound: "end_timestamp"
required: false
```

Install with the Presto values:

```bash
helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml
```

:::{note}
Presto is deployed when `clpConfig.presto` is set to a non-null value. To disable the clp-s native query
components, set their config keys to `null` as shown above.
:::

For more details on querying logs through Presto, see the [Using Presto][presto-guide] guide.

### Worker scheduling

You can control where workers are scheduled using standard Kubernetes scheduling primitives
(`nodeSelector`, `affinity`, `tolerations`, `topologySpreadConstraints`).

:::{note}
When using Presto as the query engine, use `prestoWorker:` instead of `queryWorker:` and `reducer:`
to configure Presto worker scheduling. The `prestoWorker:` key supports the same `scheduling:`
options.
:::

#### Dedicated node pools

To run compression workers, query workers, and reducers in separate node pools:
Expand All @@ -263,6 +346,9 @@ To run compression workers, query workers, and reducers in separate node pools:

# Label query nodes
kubectl label nodes node3 node4 yscope.io/nodeType=query

# Label Presto nodes (if using Presto as the query engine)
kubectl label nodes node5 node6 yscope.io/nodeType=presto
```

2. Configure scheduling:
Expand All @@ -276,19 +362,25 @@ To run compression workers, query workers, and reducers in separate node pools:
replicas: 2
scheduling:
nodeSelector:
yscope.io/nodeType: compression
yscope.io/nodeType: "compression"

queryWorker:
replicas: 2
scheduling:
nodeSelector:
yscope.io/nodeType: query
yscope.io/nodeType: "query"

reducer:
replicas: 2
scheduling:
nodeSelector:
yscope.io/nodeType: query
yscope.io/nodeType: "query"

prestoWorker:
replicas: 2
scheduling:
nodeSelector:
yscope.io/nodeType: "presto"
```

3. Install:
Expand Down Expand Up @@ -318,7 +410,7 @@ To run all worker types in the same node pool:
replicas: 2
scheduling:
nodeSelector:
yscope.io/nodeType: compute
yscope.io/nodeType: "compute"
topologySpreadConstraints:
- maxSkew: 1
topologyKey: "kubernetes.io/hostname"
Expand All @@ -331,13 +423,19 @@ To run all worker types in the same node pool:
replicas: 2
scheduling:
nodeSelector:
yscope.io/nodeType: compute
yscope.io/nodeType: "compute"

reducer:
replicas: 2
scheduling:
nodeSelector:
yscope.io/nodeType: compute
yscope.io/nodeType: "compute"

prestoWorker:
replicas: 2
scheduling:
nodeSelector:
yscope.io/nodeType: "compute"
```

3. Install:
Expand Down Expand Up @@ -542,6 +640,7 @@ To tear down a `kubeadm` cluster:
* [External database setup][external-db-guide]: Using external MariaDB and MongoDB
* [Using object storage][s3-storage]: Configuring S3 storage
* [Configuring retention periods][retention-guide]: Setting up data retention policies
* [Using Presto][presto-guide]: Distributed SQL queries on compressed logs

[admin-tools]: reference-sbin-scripts/admin-tools.md
[aks]: https://azure.microsoft.com/en-us/products/kubernetes-service
Expand All @@ -559,6 +658,8 @@ To tear down a `kubeadm` cluster:
[kind]: https://kind.sigs.k8s.io/
[kubeadm]: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
[kubectl]: https://kubernetes.io/docs/tasks/tools/
[logging-infra-issue]: https://github.com/y-scope/clp/issues/1760
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove the stale link reference.

[logging-infra-issue] is not referenced anywhere in this document, so markdownlint will keep flagging MD053 until it is deleted.

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 664-664: Link and image reference definitions should be needed
Unused link or image reference definition: "logging-infra-issue"

(MD053, link-image-reference-definitions)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/user-docs/guides-k8s-deployment.md` at line 664, Remove the unused
markdown link reference "[logging-infra-issue]" from the document: locate the
link definition line that reads "[logging-infra-issue]:
https://github.com/y-scope/clp/issues/1760" and delete it (and verify there are
no remaining usages of the identifier in the file such as
"[logging-infra-issue]" inline links); this resolves the MD053 markdownlint
warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used

[presto-guide]: guides-using-presto.md
[quick-start]: quick-start/index.md
[retention-guide]: guides-retention.md
[rfc-1918]: https://datatracker.ietf.org/doc/html/rfc1918#section-3
Expand Down
97 changes: 94 additions & 3 deletions docs/src/user-docs/guides-using-presto.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,17 +14,106 @@ maintained in a [fork][yscope-presto] of the Presto project. At some point, thes
been merged into the main Presto repository so that you can use official Presto releases with CLP.
:::

## Requirements
## Deployment options

CLP supports Presto through two deployment methods:

* **[Kubernetes (Helm)](#kubernetes-helm)**: Presto is deployed as part of the CLP Helm chart. This
is the simplest option if you are already using the [Kubernetes deployment][k8s-deployment].
* **[Docker Compose](#docker-compose)**: Presto is deployed separately using Docker Compose alongside
a CLP package installation.

## Kubernetes (Helm)

When deploying CLP on Kubernetes using Helm, Presto can be enabled by setting `clpConfig.presto` to
a non-null configuration and `webui.query_engine` to `"presto"`. The `query_engine` setting controls
which search interface the Web UI displays. Presto runs alongside the existing compression pipeline;
the clp-s native query components can optionally be disabled to save resources.

### Requirements

* A running CLP Kubernetes deployment (see the [Kubernetes deployment guide][k8s-deployment])

### Set up

1. Create a values file to enable Presto:

```{code-block} yaml
:caption: presto-values.yaml

clpConfig:
webui:
query_engine: "presto"

# Optional: Disable the clp-s native query pipeline to save resources.
# NOTE: The API server depends on the clp-s native query pipeline.
api_server: null
query_scheduler: null
query_worker: null
reducer: null

# Disable results cache retention since the Presto integration doesn't yet support
# garbage collection of search results.
results_cache:
retention_period: null

presto:
port: 30889
coordinator:
logging_level: "INFO"
query_max_memory_gb: 1
query_max_memory_per_node_gb: 1
worker:
query_memory_gb: 4
system_memory_gb: 8
# Split filter config for the Presto CLP connector. For each dataset, add a filter entry.
# Replace <dataset> with the dataset name (use "default" if you didn't specify one when
# compressing) and <timestamp-key> with the timestamp key used during compression.
# See https://docs.yscope.com/presto/connector/clp.html#split-filter-config-file
split_filter:
clp.default.<dataset>:
- columnName: "<timestamp-key>"
customOptions:
rangeMapping:
lowerBound: "begin_timestamp"
upperBound: "end_timestamp"
required: false
```

2. Install (or upgrade) the Helm chart with the Presto values:

```bash
helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml
```
Comment on lines +83 to +87
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

helm install will fail for users with an existing CLP deployment.

The requirement at line 33 explicitly states the user already has "a running CLP Kubernetes deployment," yet step 2 only shows helm install, which would error out with release already exists. The parenthetical "(or upgrade)" acknowledges the upgrade path but never shows the corresponding command.

💡 Suggested fix
-2. Install (or upgrade) the Helm chart with the Presto values:
+2. If you haven't installed CLP yet, install it with Presto enabled:
 
    ```bash
    helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml
    ```
+
+   If CLP is already installed, upgrade the release instead:
+
+   ```bash
+   helm upgrade clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml
+   ```
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
2. Install (or upgrade) the Helm chart with the Presto values:
```bash
helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml
```
2. If you haven't installed CLP yet, install it with Presto enabled:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/user-docs/guides-using-presto.md` around lines 66 - 70, The docs
show only a helm install command which will fail when CLP is already deployed;
update the step under the Presto install to include the alternative helm upgrade
invocation so users can perform an upgrade when a release exists—specifically
add guidance to run `helm upgrade clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f
presto-values.yaml` as the alternative to `helm install` (referencing the
existing `helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f
presto-values.yaml` and the `presto-values.yaml` values file).


3. Verify that the Presto coordinator and worker pods are running:

```bash
kubectl get pods -l "app.kubernetes.io/component in (presto-coordinator, presto-worker)"
```

Once the pods are ready, you can [query your logs through Presto](#querying-your-logs-through-presto)
using CLP's Web UI.

:::{note}
When using Kubernetes, Presto worker scheduling can be configured using the `prestoWorker.scheduling`
key in Helm values. See the [worker scheduling][k8s-scheduling] section of the Kubernetes deployment
guide for details.
:::

## Docker Compose

### Requirements

* [CLP][clp-releases] (clp-json) v0.5.0 or higher
* [Docker] v28 or higher
* [Docker Compose][docker-compose] v2.20.2 or higher
* Python
* python3-venv (for the version of Python installed)

## Set up
### Set up

Using Presto with CLP requires:
Using Presto with CLP via Docker Compose requires:

* [Setting up CLP](#setting-up-clp) and compressing some logs.
* [Setting up Presto](#setting-up-presto) to query CLP's metadata database and archives.
Expand Down Expand Up @@ -227,6 +316,8 @@ These limitations will be addressed in a future release of the Presto integratio
[clp-releases]: https://github.com/y-scope/clp/releases
[docker-compose]: https://docs.docker.com/compose/install/
[Docker]: https://docs.docker.com/engine/install/
[k8s-deployment]: guides-k8s-deployment.md
[k8s-scheduling]: guides-k8s-deployment.md#worker-scheduling
[postgresql]: https://zenodo.org/records/10516401
[Presto]: https://prestodb.io/
[y-scope/presto#8]: https://github.com/y-scope/presto/issues/8
Expand Down
2 changes: 1 addition & 1 deletion docs/src/user-docs/quick-start/clp-text.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ helm repo update clp

helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG \
--set clpConfig.package.storage_engine=clp \
--set clpConfig.package.query_engine=clp \
--set clpConfig.webui.query_engine=clp \
--set clpConfig.webui.port="$CLP_WEBUI_PORT" \
--set clpConfig.results_cache.port="$CLP_RESULTS_CACHE_PORT" \
--set clpConfig.database.port="$CLP_DATABASE_PORT" \
Expand Down
3 changes: 3 additions & 0 deletions tools/deployment/package-helm/.set-up-common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,9 @@ nodes:
- containerPort: 30800
hostPort: 30800
protocol: TCP
- containerPort: 30889
hostPort: 30889
protocol: TCP
EOF

for ((i = 0; i < num_workers; i++)); do
Expand Down
2 changes: 1 addition & 1 deletion tools/deployment/package-helm/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
apiVersion: "v2"
name: "clp"
version: "0.2.1-dev.1"
version: "0.2.1-dev.2"
description: "A Helm chart for CLP's (Compressed Log Processor) package deployment"
type: "application"
appVersion: "0.10.1-dev"
Expand Down
18 changes: 16 additions & 2 deletions tools/deployment/package-helm/set-up-multi-dedicated-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,11 @@ CLP_HOME="${CLP_HOME:-/tmp/clp}"
CLUSTER_NAME="${CLUSTER_NAME:-clp-test}"
NUM_COMPRESSION_NODES="${NUM_COMPRESSION_NODES:-2}"
NUM_QUERY_NODES="${NUM_QUERY_NODES:-2}"
NUM_PRESTO_NODES="${NUM_PRESTO_NODES:-2}"
COMPRESSION_WORKER_REPLICAS="${COMPRESSION_WORKER_REPLICAS:-2}"
QUERY_WORKER_REPLICAS="${QUERY_WORKER_REPLICAS:-2}"
REDUCER_REPLICAS="${REDUCER_REPLICAS:-2}"
PRESTO_WORKER_REPLICAS="${PRESTO_WORKER_REPLICAS:-2}"
Comment on lines +13 to +17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The default dedicated test now allocates unused Presto nodes.

NUM_PRESTO_NODES is included in total_workers, but this script still installs the chart without enabling Presto mode. In the default path that grows the kind cluster and labels a Presto node pool that no pod can use. Default the Presto node count to 0, or switch the Helm install into Presto mode when these settings are present.

Also applies to: 37-38, 58-78

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/deployment/package-helm/set-up-multi-dedicated-test.sh` around lines 13
- 17, NUM_PRESTO_NODES is defaulting to 2 and is counted in total_workers even
when Presto mode is not enabled, leaving unused Presto nodes; change the default
to 0 (NUM_PRESTO_NODES="${NUM_PRESTO_NODES:-0}") or, alternatively, when
NUM_PRESTO_NODES or PRESTO_WORKER_REPLICAS is set >0 switch the Helm
install/values into Presto mode so pods can be scheduled; update the same
defaults/logic for the other occurrences (lines referenced around 37-38 and
58-78) and ensure total_workers calculation and kind node pool labeling only
include Presto nodes when Presto mode is enabled.


# shellcheck source=.set-up-common.sh
source "${script_dir}/.set-up-common.sh"
Expand All @@ -23,14 +25,16 @@ echo "=== Multi-node setup with dedicated worker nodes ==="
echo "Cluster: ${CLUSTER_NAME}"
echo "Compression nodes: ${NUM_COMPRESSION_NODES}"
echo "Query nodes: ${NUM_QUERY_NODES}"
echo "Presto nodes: ${NUM_PRESTO_NODES}"
echo "Compression workers: ${COMPRESSION_WORKER_REPLICAS}"
echo "Query workers: ${QUERY_WORKER_REPLICAS}"
echo "Reducers: ${REDUCER_REPLICAS}"
echo "Presto workers: ${PRESTO_WORKER_REPLICAS}"
echo ""

prepare_environment "${CLUSTER_NAME}"

total_workers=$((NUM_COMPRESSION_NODES + NUM_QUERY_NODES))
total_workers=$((NUM_COMPRESSION_NODES + NUM_QUERY_NODES + NUM_PRESTO_NODES))

echo "Creating kind cluster..."
generate_kind_config "${total_workers}" | kind create cluster --name "${CLUSTER_NAME}" --config=-
Expand All @@ -45,11 +49,18 @@ for ((i = 0; i < NUM_COMPRESSION_NODES; i++)); do
done

# Label query nodes
for ((i = NUM_COMPRESSION_NODES; i < total_workers; i++)); do
query_end=$((NUM_COMPRESSION_NODES + NUM_QUERY_NODES))
for ((i = NUM_COMPRESSION_NODES; i < query_end; i++)); do
echo "Labeling ${worker_nodes[$i]} as query node"
kubectl label node "${worker_nodes[$i]}" yscope.io/nodeType=query --overwrite
done

# Label Presto nodes
for ((i = query_end; i < total_workers; i++)); do
echo "Labeling ${worker_nodes[$i]} as presto node"
kubectl label node "${worker_nodes[$i]}" yscope.io/nodeType=presto --overwrite
done

echo "Installing Helm chart..."
helm uninstall test --ignore-not-found
sleep 2
Expand All @@ -62,6 +73,9 @@ helm install test "${script_dir}" \
--set "queryWorker.replicas=${QUERY_WORKER_REPLICAS}" \
--set "queryWorker.scheduling.nodeSelector.yscope\.io/nodeType=query" \
--set "reducer.replicas=${REDUCER_REPLICAS}" \
--set "reducer.scheduling.nodeSelector.yscope\.io/nodeType=query" \
--set "prestoWorker.replicas=${PRESTO_WORKER_REPLICAS}" \
--set "prestoWorker.scheduling.nodeSelector.yscope\.io/nodeType=presto" \
$(get_image_helm_args "${CLUSTER_NAME}" "${CLP_PACKAGE_IMAGE}")

wait_for_cluster_ready
Loading
Loading