Skip to content

Latest commit

 

History

History
executable file
·
289 lines (230 loc) · 11.4 KB

File metadata and controls

executable file
·
289 lines (230 loc) · 11.4 KB

Deployment orchestration

The CLP package comprises several components that are designed to be deployed in a set of interdependent containers, and orchestrated by a framework that ensures the containers work together to facilitate CLP's different functions correctly. This document explains the architecture of the package components, and describes the two orchestration frameworks that CLP supports:

Architecture

Figure 1 shows the components (services in orchestrator terminology) in the CLP package as well as their dependencies. The CLP package consists of several long-running services (e.g., database) and some one-time initialization jobs (e.g., db-table-creator). Some of the long-running services depend on the successful completion of the one-time jobs (e.g., webui depends on results-cache-indices-creator), while others depend on the health of other long-running services (e.g., compression-scheduler depends on queue).

Table 1 below lists the services their functions, while Table 2 lists the one-time initialization jobs and their functions.

(figure-1)= ::::{card}

:::{mermaid} %%{ init: { "theme": "base", "themeVariables": { "primaryColor": "#0066cc", "primaryTextColor": "#fff", "primaryBorderColor": "transparent", "lineColor": "#007fff", "secondaryColor": "#007fff", "tertiaryColor": "#fff" } } }%% graph LR %% Services database["database (MySQL)"] queue["queue (RabbitMQ)"] redis["redis (Redis)"] results_cache["results-cache (MongoDB)"] compression_scheduler["compression-scheduler"] query_scheduler["query-scheduler"] spider_scheduler["spider-scheduler"] compression_worker["compression-worker"] spider_compression_worker["spider-compression-worker"] query_worker["query-worker"] reducer["reducer"] api_server["api-server"] garbage_collector["garbage-collector"] webui["webui"] mcp_server["mcp-server"] log_ingestor["log-ingestor"]

%% One-time jobs db_table_creator["db-table-creator"] results_cache_indices_creator["results-cache-indices-creator"]

%% Dependencies %% Link 0-1: Database --> Database initialization jobs database -->|healthy| db_table_creator results_cache -->|healthy| results_cache_indices_creator linkStyle 0,1 stroke:#ffa500

%% Link 2-5: Celery dependencies --> Schedulers queue -->|healthy| compression_scheduler redis -->|healthy| compression_scheduler queue -->|healthy| query_scheduler redis -->|healthy| query_scheduler linkStyle 2,3,4,5 stroke:#ff0000

%% Link 6: Schedulers --> Workers query_scheduler -->|healthy| reducer linkStyle 6 stroke:#800080

%% Link 7-15: Database initialization job --> Services db_table_creator -->|completed_successfully| api_server db_table_creator -->|completed_successfully| compression_scheduler db_table_creator -->|completed_successfully| garbage_collector db_table_creator -->|completed_successfully| log_ingestor db_table_creator -->|completed_successfully| mcp_server db_table_creator -->|completed_successfully| query_scheduler db_table_creator -->|completed_successfully| spider_compression_worker db_table_creator -->|completed_successfully| spider_scheduler db_table_creator -->|completed_successfully| webui linkStyle 7,8,9,10,11,12,13,14,15 stroke:#0000ff

%% Link 16-20: Results cache initialization job --> Services results_cache_indices_creator -->|completed_successfully| api_server results_cache_indices_creator -->|completed_successfully| garbage_collector results_cache_indices_creator -->|completed_successfully| mcp_server results_cache_indices_creator -->|completed_successfully| reducer results_cache_indices_creator -->|completed_successfully| webui linkStyle 16,17,18,19,20 stroke:#008000

subgraph Databases database results_cache subgraph celery_dependencies[Celery Dependencies] queue redis end end

subgraph Initialization jobs db_table_creator results_cache_indices_creator end

subgraph Schedulers compression_scheduler query_scheduler spider_scheduler end

subgraph Workers compression_worker spider_compression_worker query_worker reducer end

subgraph Management & UI api_server log_ingestor garbage_collector webui end

subgraph AI mcp_server end

%% Subgraph styles style celery_dependencies fill:#ffffe0 style spider_compression_worker fill:#008080 style spider_scheduler fill:#008080

+++ Figure 1: Orchestration architecture of the services in the CLP package. ::::

(table-1)= ::::{card}

:::{table} :align: left

Service Description
database Database for archive metadata, compression jobs, and query jobs
queue Task queue for schedulers
redis Task result storage for workers
compression_scheduler Scheduler for compression jobs
query_scheduler Scheduler for search/aggregation jobs
spider_scheduler Scheduler for Spider distributed task execution framework
results_cache Storage for the workers to return search results to the UI
compression_worker Worker processes for compression jobs using Celery
spider_compression_worker Worker processes for compression jobs using Spider
query_worker Worker processes for search/aggregation jobs using Celery
reducer Reducers for performing the final stages of aggregation jobs
api_server API server for submitting queries
webui Web server for the UI
mcp_server MCP server for AI agent to access CLP functionalities
garbage_collector Process to manage data retention
log_ingestor Server for orchestrating and running continuous log ingestion jobs

:::

+++ Table 1: Long-running services in the CLP package. ::::

(table-2)= ::::{card}

:::{table} :align: left

Job Description
db-table-creator Creates and initializes database tables
results-cache-indices-creator Creates a single-node replica set and sets up indices

:::

+++ Table 2: One-time initialization jobs in the CLP package. ::::

Orchestration methods

CLP supports two orchestration methods: Docker Compose for single-host or manual multi-host deployments, and Helm for Kubernetes deployments. Both methods share the same configuration interface (clp-config.yaml and credentials.yaml) and support the same deployment types.

Configuration

Each service requires configuration values passed through config files, environment variables, and/or command line arguments. Since services run in containers, some values must be adapted for the orchestration environment. Specifically, host paths must be converted to container paths, and hostnames/ports must use service discovery mechanisms.

The orchestration controller (e.g., DockerComposeController) reads etc/clp-config.yaml and etc/credentials.yaml, then generates:

  • A container-specific CLP config file with adapted paths and service names
  • Runtime configuration (environment variables or ConfigMaps)
  • Required directories (e.g., data output directories)

For Docker Compose, this generates var/log/.clp-config.yaml and .env. For Kubernetes, the Helm chart generates a ConfigMap and Secrets from values.yaml.

:::{note} We are currently developing a KubernetesController, which will unify the configuration experience across both orchestration methods. The new controller will read clp-config.yaml and credentials.yaml like DockerComposeController, then set up the Helm release accordingly. :::

Secrets

Sensitive credentials (database passwords, API keys) are stored in etc/credentials.yaml and require special handling to avoid exposure.

  • Docker Compose: Credentials are written to .env and passed as environment variables
  • Kubernetes: Credentials are stored in Kubernetes Secrets

Dependencies

As shown in Figure 1, services have complex interdependencies. Both orchestrators ensure services start only after their dependencies are healthy.

  • Docker Compose: Uses depends_on with condition: service_healthy and container healthchecks
  • Kubernetes: Uses init containers (via the clp.waitFor helper) and readiness/liveness probes

Storage

Services require persistent storage for logs, data, archives, and streams.

  • Docker Compose: Uses bind mounts for host directories and named volumes for database data. Conditional mounts use variable interpolation to mount empty tmpfs when not needed.
  • Kubernetes: Uses dynamically provisioned PersistentVolumeClaims for persistent data (database, results cache, archives, streams) and emptyDir volumes for ephemeral state (Redis, staging directories). Service logs are emitted to pod stdout/stderr.

Deployment types

CLP supports multiple deployment configurations based on the compression scheduler and query engine.

Deployment Type Compression Scheduler Query Engine
Base Celery Presto
Full Celery Native
Spider Base Spider Presto
Spider Full Spider Native

:::{note} Spider support is not yet available for Helm. :::

Docker Compose selects the appropriate compose file (e.g., docker-compose.yaml for Full, docker-compose-spider.yaml for Spider Full) and uses deploy.replicas with environment variables (e.g., CLP_MCP_SERVER_ENABLED) to toggle optional services. Helm uses conditional templating to include/exclude resources.

Troubleshooting

When issues arise, use the appropriate commands for your orchestration method:

User guides