This guide explains how to deploy CLP using Docker Compose. Docker Compose provides a straightforward way to orchestrate CLP's services, suitable for both development and production environments.
Docker Compose can be used for:
- A single-host deployment: Run all CLP services on a single machine. This is the simplest setup, covered in the quick-start guides.
- A multi-host deployment: Distribute CLP services across multiple machines for higher throughput and scalability. This is covered in detail below.
A multi-host deployment allows you to run CLP across a distributed set of hosts.
:::{note} The instructions below use manual Docker Compose orchestration. Compared to using Kubernetes (via Helm), Docker Compose is more lightweight and provides fine-grained control over service placement, but requires more configuration. :::
- Docker and Docker Compose
- If you're not running as root, ensure Docker can be run without superuser privileges.
- One or more hosts networked together
- When not using S3 storage, a shared filesystem accessible by all worker hosts (e.g., NFS,
SeaweedFS)
- See below for how to set up a simple SeaweedFS cluster.
The CLP package is composed of several components (services in orchestrator terminology) including infrastructure services, schedulers, workers, and supporting services. For a detailed overview of all services and their dependencies, see the deployment orchestration design doc.
In a multi-host cluster:
- infrastructure services and schedulers should be run once per cluster (they're singleton services).
- workers can be run on multiple hosts to increase parallelism.
To configure CLP for multi-host deployment, you'll need to:
- configure and run CLP's environment setup scripts.
- distribute and configure the CLP package on all hosts in your cluster.
-
Extract the CLP package on one host (the "setup host").
-
Configure credentials:
- Copy
etc/credentials.template.yamltoetc/credentials.yaml. - Edit
etc/credentials.yamlto set usernames and passwords.
- Copy
-
Edit CLP's configuration file:
-
Open
etc/clp-config.yaml. -
Configure which services should be bundled (managed by the
clp-packageDocker Compose project) vs. external.bundled: # Remove services you want to run on specific hosts or use external instances - database # Remove if running on a dedicated host or using external MySQL-compatible DB - queue # Remove if running on a dedicated host or using external RabbitMQ - redis # Remove if running on a dedicated host or using external Redis - results_cache # Remove if running on a dedicated host or using external MongoDB
-
For each service, set the
hostandportfields to the actual hostname/IP and port where you plan to run the specific service. -
When using local filesystem storage (i.e., not S3), set
logs_input.directory,archive_output.storage.directory, andstream_output.storage.directoryto directories on the shared filesystem.
-
-
Set up the CLP package's environment:
sbin/start-clp.sh --setup-only
This will:
- Validate your configuration
- Create any necessary directories
- Generate an
.envfile with all necessary environment variables - Create
var/log/.clp-config.yaml(the container-specific configuration file) - Create
var/www/webui/server/dist/settings.json(thewebuiserver's configuration file)
With the package set up, we can now distribute it to all hosts in the cluster:
-
Copy the set-up package to all hosts where you want to run CLP services.
- Ensure the package is copied to the same location on every host or else, on each host, you'll
need to modify the paths in
.envas appropriate.
- Ensure the package is copied to the same location on every host or else, on each host, you'll
need to modify the paths in
-
Configure worker concurrency (optional):
On each worker host, edit the
.envfile to adjust worker concurrency settings as needed:CLP_COMPRESSION_WORKER_CONCURRENCYCLP_QUERY_WORKER_CONCURRENCYCLP_REDUCER_CONCURRENCY
Recommended settings:
- If workers are started on separate hosts, set each concurrency value to match the CPU count on that host.
- If compression and query/reducer workers are started on the same host, set each concurrency value to half the CPU count (e.g., for a 16-core host, set all three to 8).
You can start CLP across multiple hosts by starting each service on the relevant host. The commands below indicate how to do so, with comments indicating the startup order and dependencies between services.
:::{note}
For clp-json + Presto deployments (package.storage_engine: clp-s with
package.query_engine: presto), you can omit starting the query-scheduler, query-worker, and
reducer services.
:::
:::{tip}
If you want to use your own MariaDB/MySQL or MongoDB servers instead of the Docker Compose managed
databases, see the external database setup guide. When using external
databases, skip starting the database and results-cache services below.
:::
All commands below assume you are running them from the root of the CLP package directory.
################################################################################
# Infrastructure services
################################################################################
# Start database (skip if using external database)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up database \
--no-deps --wait
# Initialize database
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up db-table-creator \
--no-deps
# Start queue (if using Celery)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up queue \
--no-deps --wait
# Start redis (if using Celery)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up redis \
--no-deps --wait
# Start results cache (skip if using external MongoDB)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up results-cache \
--no-deps --wait
# Initialize results cache
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up results-cache-indices-creator \
--no-deps
################################################################################
# Controller services (schedulers, UI, and supporting services)
################################################################################
# Start compression scheduler
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up compression-scheduler \
--no-deps --wait
# Start Spider scheduler (optional, only if using Spider)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up spider-scheduler \
--no-deps --wait
# Start query scheduler
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up query-scheduler \
--no-deps --wait
# Start API server
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up api-server \
--no-deps --wait
# Start log-ingestor (optional, only if configured)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up log-ingestor \
--no-deps --wait
# Start webui
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up webui \
--no-deps --wait
# Start garbage collector (optional, only if retention is configured)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up garbage-collector \
--no-deps --wait
# Start MCP server (optional, only if configured)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up mcp-server \
--no-deps --wait
################################################################################
# Worker services (can be started on multiple hosts)
################################################################################
# Start compression worker (if using Celery)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up compression-worker \
--no-deps --wait
# Start Spider compression worker (optional, only if using Spider)
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up spider-compression-worker \
--no-deps --wait
# Start query worker
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up query-worker \
--no-deps --wait
# Start reducer
docker compose \
--project-name "clp-package-$(cat var/log/instance-id)" \
up reducer \
--no-deps --wait:::{note}
To increase parallelism, start worker services (compression-worker, query-worker, reducer) on
multiple hosts.
:::
To learn how to compress and search your logs, check out the quick-start guide that corresponds to the flavor of CLP you're running:
::::{grid} 1 1 2 2 :gutter: 2
:::{grid-item-card} :link: quick-start/clp-json Using clp-json ^^^ How to compress and search JSON logs. :::
:::{grid-item-card} :link: quick-start/clp-text Using clp-text ^^^ How to compress and search unstructured text logs. ::: ::::
To stop CLP, on every host where it's running, run:
sbin/stop-clp.shThis will stop all CLP services managed by Docker Compose on the current host.
First, determine your instance ID from <clp-package>/var/log/instance-id.
To check the status of services:
docker compose --project-name clp-package-<instance-id> psTo view logs for a specific service:
docker compose --project-name clp-package-<instance-id> logs -f <service-name>To execute commands in a running container:
docker compose --project-name clp-package-<instance-id> exec <service-name> /bin/bashTo validate your Docker Compose configuration:
docker compose configThe instructions below are for running a simple SeaweedFS cluster on a set of hosts. For other use cases, see the SeaweedFS docs.
-
Install SeaweedFS.
-
Start the master and a filer on one of the hosts:
weed master -port 9333 weed filer -port 8888 -master "localhost:9333" -
Start one or more volume servers on one or more hosts.
{style=lower-alpha}
-
Create a directory where you want SeaweedFS to store data.
-
Start the volume server:
weed volume -mserver "<master-host>:9333" -dir <storage-dir> -max 0
<master-host>is the hostname/IP of the master host.<storage-dir>is the directory where you want SeaweedFS to store data.
-
-
Start a FUSE mount on every host that you want to run a CLP worker:
weed mount -filer "<master-host>:8888" -dir <mount-path>
<master-host>is the hostname/IP of the master host.<mount-path>is the path where you want the mount to be.