This guide explains how to deploy CLP on Kubernetes using Helm. This provides an alternative to Docker Compose and enables deployment on Kubernetes clusters ranging from local development setups to production environments.
:::{note} For a detailed overview of CLP's services and their dependencies, see the deployment orchestration design doc. :::
The following tools are required to deploy CLP on Kubernetes:
kubectl>= 1.30- Helm >= 4.0
- A Kubernetes cluster (see Setting up a cluster below)
- When not using S3 storage, a shared filesystem accessible by all worker pods (e.g., NFS, SeaweedFS) or local storage for single-node deployments
You can deploy CLP on either a local development cluster or a production Kubernetes cluster.
kind (Kubernetes in Docker) is ideal for testing and development. It runs a Kubernetes
cluster inside Docker containers on your local machine.
For single-host kind deployments, see the quick-start guides, which cover creating
a kind cluster and installing the Helm chart.
For production deployments, you can use any Kubernetes distribution:
- Managed Kubernetes services: Amazon EKS, Azure AKS, Google GKE
- Self-hosted:
kubeadm, k3s, RKE2
kubeadm is the official Kubernetes tool for bootstrapping clusters. You can follow the
official kubeadm installation guide to install the prerequisites, container runtime,
and kubeadm on all nodes. Then follow the steps below to create a cluster.
-
Initialize the control plane (on the control-plane node only):
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
:::{tip} Save the
kubeadm joincommand printed at the end of the output. You'll need it to join worker nodes later. ::::::{note} The
--pod-network-cidrspecifies the IP range for pods. If10.244.0.0/16conflicts with your network, use a different private range as RFC 1918 specifies (e.g.,192.168.0.0/16,172.16.0.0/16, or10.200.0.0/16). :::To set up
kubectlfor your user:mkdir -p "$HOME/.kube" sudo cp -i /etc/kubernetes/admin.conf "$HOME/.kube/config" sudo chown "$(id -u):$(id -g)" "$HOME/.kube/config"
-
Install a CNI plugin (on the control-plane node):
A CNI plugin is required for pod-to-pod networking. The following installs Cilium, a high-performance CNI that uses eBPF:
helm repo add cilium https://helm.cilium.io/ helm repo update helm install cilium cilium/cilium --namespace kube-system \ --set ipam.operator.clusterPoolIPv4PodCIDRList=10.244.0.0/16
:::{note} The
clusterPoolIPv4PodCIDRListmust match the--pod-network-cidrused inkubeadm init. ::: -
Join worker nodes (on each worker node):
Run the
kubeadm joincommand you saved from step 1. It should look something like:sudo kubeadm join <control-plane-ip>:6443 \ --token <token> \ --discovery-token-ca-cert-hash sha256:<hash>
If you need to regenerate the command, on the control-plane node, run:
kubeadm token create --print-join-command
Once your cluster is ready, you can install CLP using the Helm chart.
The CLP Helm chart is published to a Helm repository hosted on GitHub Pages.
helm repo add clp https://y-scope.github.io/clp
helm repo update clpThe following configurations are optional but recommended for production deployments. You can skip this section for testing or development.
-
Shared storage for workers (required for multi-node clusters using filesystem storage):
:::{tip} S3 storage is strongly recommended for multi-node clusters as it does not require shared local storage between workers. If you use S3 storage, you can skip this section. :::
If storage type is set to
fs, users must manually provision the persistent volumes and updateaccessModesof PVCs. -
External databases (recommended for production):
- See the external database setup guide for using external MariaDB/MySQL and MongoDB databases
Generate credentials and install CLP:
# Credentials (change these for production)
export CLP_DB_PASS="pass"
export CLP_DB_ROOT_PASS="root-pass"
export CLP_QUEUE_PASS="pass"
export CLP_REDIS_PASS="pass"
# Worker replicas (increase for multi-node clusters)
export CLP_COMPRESSION_WORKER_REPLICAS=1
export CLP_QUERY_WORKER_REPLICAS=1
export CLP_REDUCER_REPLICAS=1
helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG \
--set credentials.database.password="$CLP_DB_PASS" \
--set credentials.database.root_password="$CLP_DB_ROOT_PASS" \
--set credentials.queue.password="$CLP_QUEUE_PASS" \
--set credentials.redis.password="$CLP_REDIS_PASS" \
--set compressionWorker.replicas="$CLP_COMPRESSION_WORKER_REPLICAS" \
--set queryWorker.replicas="$CLP_QUERY_WORKER_REPLICAS" \
--set reducer.replicas="$CLP_REDUCER_REPLICAS"For multi-node clusters with shared storage mounted on all nodes (e.g., NFS/CephFS via
/etc/fstab), enable distributed storage mode and configure multiple worker replicas:
helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG \
--set distributedDeployment=true \
--set compressionWorker.replicas=3 \
--set queryWorker.replicas=3 \
--set reducer.replicas=3For highly customized deployments, create a values file instead of using many --set flags:
:caption: custom-values.yaml
# Use a custom image. For local images, import to each node's container runtime first.
image:
clpPackage:
repository: "clp-package"
pullPolicy: "Never" # Use "Never" for local images, "IfNotPresent" for remote
tag: "latest"
# Adjust worker concurrency
workerConcurrency: 16
# Configure CLP settings
clpConfig:
# Use clp-text, instead of clp-json (default)
package:
storage_engine: "clp" # Use "clp-s" for clp-json, "clp" for clp-text
webui:
query_engine: "clp" # Use "clp-s" for clp-json, "clp" for clp-text, "presto" for Presto
# Configure archive output
archive_output:
target_archive_size: 536870912 # 512 MB
compression_level: 6
retention_period: 43200 # (in minutes) 30 days
# Enable MCP server
mcp_server:
port: 30800
logging_level: "INFO"
# Configure results cache
results_cache:
retention_period: 120 # (in minutes) 2 hours
# Override credentials (use secrets in production!)
credentials:
database:
username: "clp-user"
password: "your-db-password"
root_username: "root"
root_password: "your-db-root-password"
queue:
username: "clp-user"
password: "your-queue-password"
redis:
password: "your-redis-password"
Install with custom values:
helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f custom-values.yaml::::{tip}
To preview the generated Kubernetes manifests before installing, use helm template:
helm template clp . -f custom-values.yaml::::
To use Presto as the query engine, set webui.query_engine to "presto" and
configure the Presto-specific settings. The query_engine setting controls which search interface
the Web UI displays. Presto runs alongside the existing compression pipeline; setting the clp-s
native query components to null is optional but recommended to save resources when you don't need
both query paths:
:caption: presto-values.yaml
image:
prestoCoordinator:
repository: "ghcr.io/y-scope/presto/coordinator"
tag: "clp-v0.10.0"
prestoWorker:
repository: "ghcr.io/y-scope/presto/prestissimo-worker"
tag: "clp-v0.10.0"
prestoWorker:
# See below "Worker scheduling" for more details on configuring Presto scheduling
replicas: 2
clpConfig:
webui:
query_engine: "presto"
# Optional: Disable the clp-s native query pipeline to save resources.
# NOTE: The API server depends on the clp-s native query pipeline.
api_server: null
query_scheduler: null
query_worker: null
reducer: null
# Disable results cache retention since the Presto integration doesn't yet support garbage
# collection of search results.
results_cache:
retention_period: null
presto:
port: 30889
coordinator:
logging_level: "INFO"
query_max_memory_gb: 1
query_max_memory_per_node_gb: 1
worker:
query_memory_gb: 4
system_memory_gb: 8
# Split filter config for the Presto CLP connector. For each dataset you want to query, add a
# filter entry. Replace <dataset> with the dataset name (use "default" if you didn't specify one
# when compressing) and <timestamp-key> with the timestamp key used during compression.
# See https://docs.yscope.com/presto/connector/clp.html#split-filter-config-file
split_filter:
clp.default.<dataset>:
- columnName: "<timestamp-key>"
customOptions:
rangeMapping:
lowerBound: "begin_timestamp"
upperBound: "end_timestamp"
required: false
Install with the Presto values:
helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml:::{note}
Presto is deployed when clpConfig.presto is set to a non-null value. To disable the clp-s native query
components, set their config keys to null as shown above.
:::
For more details on querying logs through Presto, see the Using Presto guide.
You can control where workers are scheduled using standard Kubernetes scheduling primitives
(nodeSelector, affinity, tolerations, topologySpreadConstraints).
:::{note}
When using Presto as the query engine, use prestoWorker: instead of queryWorker: and reducer:
to configure Presto worker scheduling. The prestoWorker: key supports the same scheduling:
options.
:::
To run compression workers, query workers, and reducers in separate node pools:
-
Label your nodes:
# Label compression nodes kubectl label nodes node1 node2 yscope.io/nodeType=compression # Label query nodes kubectl label nodes node3 node4 yscope.io/nodeType=query # Label Presto nodes (if using Presto as the query engine) kubectl label nodes node5 node6 yscope.io/nodeType=presto
-
Configure scheduling:
:caption: dedicated-scheduling.yaml distributedDeployment: true compressionWorker: replicas: 2 scheduling: nodeSelector: yscope.io/nodeType: "compression" queryWorker: replicas: 2 scheduling: nodeSelector: yscope.io/nodeType: "query" reducer: replicas: 2 scheduling: nodeSelector: yscope.io/nodeType: "query" prestoWorker: replicas: 2 scheduling: nodeSelector: yscope.io/nodeType: "presto" -
Install:
helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f dedicated-scheduling.yaml
To run all worker types in the same node pool:
-
Label your nodes:
kubectl label nodes node1 node2 node3 node4 yscope.io/nodeType=compute
-
Configure scheduling:
:caption: shared-scheduling.yaml distributedDeployment: true compressionWorker: replicas: 2 scheduling: nodeSelector: yscope.io/nodeType: "compute" topologySpreadConstraints: - maxSkew: 1 topologyKey: "kubernetes.io/hostname" whenUnsatisfiable: "DoNotSchedule" labelSelector: matchLabels: app.kubernetes.io/component: compression-worker queryWorker: replicas: 2 scheduling: nodeSelector: yscope.io/nodeType: "compute" reducer: replicas: 2 scheduling: nodeSelector: yscope.io/nodeType: "compute" prestoWorker: replicas: 2 scheduling: nodeSelector: yscope.io/nodeType: "compute" -
Install:
helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f shared-scheduling.yaml
After installing the Helm chart, you can verify that all components are running correctly as follows.
Wait for all pods to be ready:
# Watch pod status
kubectl get pods -w
# Wait for all pods to be ready
kubectl wait pods --all --for=condition=Ready --timeout=300sThe output should show all pods are in the Running state:
NAME READY STATUS RESTARTS AGE
clp-api-server-... 1/1 Running 0 2m
clp-compression-scheduler-... 1/1 Running 0 2m
clp-compression-worker-... 1/1 Running 0 2m
clp-database-0 1/1 Running 0 2m
clp-garbage-collector-... 1/1 Running 0 2m
clp-query-scheduler-... 1/1 Running 0 2m
clp-query-worker-... 1/1 Running 0 2m
clp-queue-0 1/1 Running 0 2m
clp-reducer-... 1/1 Running 0 2m
clp-redis-0 1/1 Running 0 2m
clp-results-cache-0 1/1 Running 0 2m
clp-webui-... 1/1 Running 0 2m
CLP runs initialization jobs on first deployment. Check that these jobs completed successfully:
# Check job completion
kubectl get jobs
# Expected output:
# NAME COMPLETIONS DURATION AGE
# clp-db-table-creator 1/1 5s 2m
# clp-results-cache-indices-creator 1/1 3s 2mOnce all pods are ready, you access the CLP Web UI at: http://<node-ip>:30000 (the value of
clpConfig.webui.port)
With CLP deployed on Kubernetes, you can compress and search logs using the same workflows as Docker Compose deployments. Refer to the quick-start guide for your chosen flavor:
::::{grid} 1 1 2 2 :gutter: 2
:::{grid-item-card} :link: quick-start/clp-json Using clp-json ^^^ How to compress and search JSON logs. :::
:::{grid-item-card} :link: quick-start/clp-text Using clp-text ^^^ How to compress and search unstructured text logs. ::: ::::
:::{note}
By default (allowHostAccessForSbinScripts: true), the database and results cache are exposed on
NodePorts, allowing you to use sbin/compress.sh and sbin/search.sh from the CLP package.
Download a release matching the chart's appVersion, then update the following
configurations in etc/clp-config.yaml:
database:
port: 30306 # Match `clpConfig.database.port` in Helm values
results_cache:
port: 30017 # Match `clpConfig.results_cache.port` in Helm valuesAlternatively, use the Web UI (clp-json or clp-text) to compress logs and search interactively, or the API server to submit queries and view results programmatically.
The admin tools (sbin/admin-tools/archive-manager.sh and
sbin/admin-tools/dataset-manager.sh) are not supported in Kubernetes deployments with
filesystem storage (archive_output.storage.type: "fs"). Those scripts require direct filesystem
access to the archive directory via Docker bind mounts, which is not possible when archives are
backed by PVCs inside the cluster.
:::
To check the status of pods:
kubectl get podsTo view logs for a specific pod:
kubectl logs -f <pod-name>To execute commands in a pod:
kubectl exec -it <pod-name> -- /bin/bashTo debug Helm chart issues:
# For debugging the published chart from the repository
helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG --dry-run --debug
# For debugging local chart changes during development
helm install clp /path/to/local/chart --dry-run --debugThis section covers how to manage your CLP Helm release.
:::{note} Upgrade and rollback are not yet supported. We plan to add support as we finalize the migration mechanism. :::
helm uninstall clp:::{warning}
Uninstalling the Helm release will delete all CLP pods and services. However, dynamically
provisioned PersistentVolumeClaims (database, results cache, archives, streams) may be retained
depending on the cluster's reclaimPolicy. To completely remove all data, delete the PVCs manually.
:::
To tear down a kubeadm cluster:
-
Uninstall Cilium (on the control-plane):
helm uninstall cilium --namespace kube-system
-
Reset each node (run on all worker nodes first, then the control-plane):
sudo kubeadm reset -f sudo rm -rf /etc/cni/net.d/* sudo umount /var/run/cilium/cgroupv2/ sudo rm -rf /var/run/cilium -
Clean up kubeconfig (on the control-plane):
rm -rf ~/.kube
- Docker Compose deployment: Docker Compose orchestration for single or multi-host setups
- External database setup: Using external MariaDB and MongoDB
- Using object storage: Configuring S3 storage
- Configuring retention periods: Setting up data retention policies
- Using Presto: Distributed SQL queries on compressed logs