Docker Compose deployment

This guide explains how to deploy CLP using Docker Compose. Docker Compose provides a straightforward way to orchestrate CLP's services, suitable for both development and production environments.

Deployment options

Docker Compose can be used for:

A single-host deployment: Run all CLP services on a single machine. This is the simplest setup, covered in the quick-start guides.
A multi-host deployment: Distribute CLP services across multiple machines for higher throughput and scalability. This is covered in detail below.

Multi-host deployment

A multi-host deployment allows you to run CLP across a distributed set of hosts.

:::{note} The instructions below use manual Docker Compose orchestration. Compared to using Kubernetes (via Helm), Docker Compose is more lightweight and provides fine-grained control over service placement, but requires more configuration. :::

Requirements

Docker and Docker Compose
- If you're not running as root, ensure Docker can be run without superuser privileges.
One or more hosts networked together
When not using S3 storage, a shared filesystem accessible by all worker hosts (e.g., NFS, SeaweedFS)
- See below for how to set up a simple SeaweedFS cluster.

Cluster overview

The CLP package is composed of several components (services in orchestrator terminology) including infrastructure services, schedulers, workers, and supporting services. For a detailed overview of all services and their dependencies, see the deployment orchestration design doc.

In a multi-host cluster:

infrastructure services and schedulers should be run once per cluster (they're singleton services).
workers can be run on multiple hosts to increase parallelism.

Configuring CLP

To configure CLP for multi-host deployment, you'll need to:

configure and run CLP's environment setup scripts.
distribute and configure the CLP package on all hosts in your cluster.

CLP environment setup

Extract the CLP package on one host (the "setup host").
Configure credentials:
- Copy etc/credentials.template.yaml to etc/credentials.yaml.
- Edit etc/credentials.yaml to set usernames and passwords.

Edit CLP's configuration file:

Open etc/clp-config.yaml.

Configure which services should be bundled (managed by the clp-package Docker Compose project) vs. external.

bundled:
  # Remove services you want to run on specific hosts or use external instances
  - database      # Remove if running on a dedicated host or using external MySQL-compatible DB
  - queue         # Remove if running on a dedicated host or using external RabbitMQ
  - redis         # Remove if running on a dedicated host or using external Redis
  - results_cache # Remove if running on a dedicated host or using external MongoDB

For each service, set the host and port fields to the actual hostname/IP and port where you plan to run the specific service.
When using local filesystem storage (i.e., not S3), set logs_input.directory, archive_output.storage.directory, and stream_output.storage.directory to directories on the shared filesystem.

Set up the CLP package's environment:
```
sbin/start-clp.sh --setup-only
```
This will:
- Validate your configuration
- Create any necessary directories
- Generate an .env file with all necessary environment variables
- Create var/log/.clp-config.yaml (the container-specific configuration file)
- Create var/www/webui/server/dist/settings.json (the webui server's configuration file)

Distributing the set-up package

With the package set up, we can now distribute it to all hosts in the cluster:

Copy the set-up package to all hosts where you want to run CLP services.
- Ensure the package is copied to the same location on every host or else, on each host, you'll need to modify the paths in .env as appropriate.
Configure worker concurrency (optional):

On each worker host, edit the .env file to adjust worker concurrency settings as needed:
- CLP_COMPRESSION_WORKER_CONCURRENCY
- CLP_QUERY_WORKER_CONCURRENCY
- CLP_REDUCER_CONCURRENCY
Recommended settings:
- If workers are started on separate hosts, set each concurrency value to match the CPU count on that host.
- If compression and query/reducer workers are started on the same host, set each concurrency value to half the CPU count (e.g., for a 16-core host, set all three to 8).

Starting CLP

You can start CLP across multiple hosts by starting each service on the relevant host. The commands below indicate how to do so, with comments indicating the startup order and dependencies between services.

:::{note} For clp-json + Presto deployments (package.storage_engine: clp-s with package.query_engine: presto), you can omit starting the query-scheduler, query-worker, and reducer services. :::

:::{tip} If you want to use your own MariaDB/MySQL or MongoDB servers instead of the Docker Compose managed databases, see the external database setup guide. When using external databases, skip starting the database and results-cache services below. :::

All commands below assume you are running them from the root of the CLP package directory.

################################################################################
# Infrastructure services
################################################################################

# Start database (skip if using external database)
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up database \
    --no-deps --wait

# Initialize database
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up db-table-creator \
    --no-deps

# Start queue (if using Celery)
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up queue \
    --no-deps --wait

# Start redis (if using Celery)
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up redis \
    --no-deps --wait

# Start results cache (skip if using external MongoDB)
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up results-cache \
    --no-deps --wait

# Initialize results cache
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up results-cache-indices-creator \
    --no-deps

################################################################################
# Controller services (schedulers, UI, and supporting services)
################################################################################

# Start compression scheduler
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up compression-scheduler \
    --no-deps --wait
    
# Start Spider scheduler (optional, only if using Spider)
docker compose \
    --project-name "clp-package-$(cat var/log/instance-id)" \
    up spider-scheduler \
      --no-deps --wait

# Start query scheduler
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up query-scheduler \
    --no-deps --wait

# Start API server
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up api-server \
    --no-deps --wait

# Start log-ingestor (optional, only if configured)
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up log-ingestor \
    --no-deps --wait

# Start webui
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up webui \
    --no-deps --wait

# Start garbage collector (optional, only if retention is configured)
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up garbage-collector \
    --no-deps --wait

# Start MCP server (optional, only if configured)
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up mcp-server \
    --no-deps --wait

################################################################################
# Worker services (can be started on multiple hosts)
################################################################################

# Start compression worker (if using Celery)
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up compression-worker \
    --no-deps --wait
    
# Start Spider compression worker (optional, only if using Spider)
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up spider-compression-worker \
    --no-deps --wait

# Start query worker
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up query-worker \
    --no-deps --wait

# Start reducer
docker compose \
  --project-name "clp-package-$(cat var/log/instance-id)" \
  up reducer \
    --no-deps --wait

:::{note} To increase parallelism, start worker services (compression-worker, query-worker, reducer) on multiple hosts. :::

Using CLP

To learn how to compress and search your logs, check out the quick-start guide that corresponds to the flavor of CLP you're running:

::::{grid} 1 1 2 2 :gutter: 2

:::{grid-item-card} :link: quick-start/clp-json Using clp-json ^^^ How to compress and search JSON logs. :::

:::{grid-item-card} :link: quick-start/clp-text Using clp-text ^^^ How to compress and search unstructured text logs. ::: ::::

Stopping CLP

To stop CLP, on every host where it's running, run:

sbin/stop-clp.sh

This will stop all CLP services managed by Docker Compose on the current host.

Monitoring and debugging

First, determine your instance ID from <clp-package>/var/log/instance-id.

To check the status of services:

docker compose --project-name clp-package-<instance-id> ps

To view logs for a specific service:

docker compose --project-name clp-package-<instance-id> logs -f <service-name>

To execute commands in a running container:

docker compose --project-name clp-package-<instance-id> exec <service-name> /bin/bash

To validate your Docker Compose configuration:

docker compose config

Setting up SeaweedFS

The instructions below are for running a simple SeaweedFS cluster on a set of hosts. For other use cases, see the SeaweedFS docs.

Install SeaweedFS.

Start the master and a filer on one of the hosts:

weed master -port 9333
weed filer -port 8888 -master "localhost:9333"

Start one or more volume servers on one or more hosts.

{style=lower-alpha}
1. Create a directory where you want SeaweedFS to store data.
2. Start the volume server:
```
weed volume -mserver "<master-host>:9333" -dir <storage-dir> -max 0
```
  - <master-host> is the hostname/IP of the master host.
  - <storage-dir> is the directory where you want SeaweedFS to store data.
Start a FUSE mount on every host that you want to run a CLP worker:
```
weed mount -filer "<master-host>:8888" -dir <mount-path>
```
- <master-host> is the hostname/IP of the master host.
- <mount-path> is the path where you want the mount to be.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker Compose deployment

Deployment options

Multi-host deployment

Requirements

Cluster overview

Configuring CLP

CLP environment setup

Distributing the set-up package

Starting CLP

Using CLP

Stopping CLP

Monitoring and debugging

Setting up SeaweedFS

FilesExpand file tree

guides-docker-compose-deployment.md

Latest commit

History

guides-docker-compose-deployment.md

File metadata and controls

Docker Compose deployment

Deployment options

Multi-host deployment

Requirements

Cluster overview

Configuring CLP

CLP environment setup

Distributing the set-up package

Starting CLP

Using CLP

Stopping CLP

Monitoring and debugging

Setting up SeaweedFS