[rust/docs] Add kafka-es-indexer sample config file and update documentation

ramonfigueiredo · ramonfigueiredo · commit eb0122cf10cd · 2025-12-04T18:09:06.000-08:00
Add rust/config/kafka-es-indexer.yaml with comprehensive documentation of all configuration options, following the same pattern as rqd.yaml.

Documentation updates:
- Reference sample config file in kafka-es-indexer README
- Add config file section to rust/README.md listing all config files
- Update monitoring-reference.md with config file usage example
- Update deploying-monitoring.md with Docker mount example for config
- Update monitoring-development.md with config file example

Addresses review feedback to create a sample config file in the rust config directory with complete documentation of all options.
diff --git a/docs/_docs/developer-guide/monitoring-development.md b/docs/_docs/developer-guide/monitoring-development.md
@@ -361,6 +361,14 @@ export ELASTICSEARCH_INDEX_PREFIX=opencue
 kafka-es-indexer
 ```
 
+Example with a config file:
+
+```bash
+kafka-es-indexer --config /path/to/kafka-es-indexer.yaml
+```
+
+A sample configuration file with complete documentation of all options is available at `rust/config/kafka-es-indexer.yaml`.
+
 ### Prometheus configuration
 
 | Property | Default | Description |
diff --git a/docs/_docs/getting-started/deploying-monitoring.md b/docs/_docs/getting-started/deploying-monitoring.md
@@ -183,6 +183,18 @@ The `kafka-es-indexer` is a standalone Rust service that consumes events from Ka
      --index-prefix opencue
    ```
 
+   Or with a configuration file (mount the config file into the container):
+
+   ```bash
+   docker run -d --name kafka-es-indexer \
+     --network your-network \
+     -v /path/to/kafka-es-indexer.yaml:/etc/opencue/kafka-es-indexer.yaml \
+     opencue/kafka-es-indexer \
+     --config /etc/opencue/kafka-es-indexer.yaml
+   ```
+
+   A sample configuration file with complete documentation is available at `rust/config/kafka-es-indexer.yaml`.
+
 3. Verify the indexer is running:
 
    ```bash
diff --git a/docs/_docs/reference/monitoring-reference.md b/docs/_docs/reference/monitoring-reference.md
@@ -304,6 +304,14 @@ kafka-es-indexer \
   --index-prefix opencue
 ```
 
+Example using a configuration file:
+
+```bash
+kafka-es-indexer --config /path/to/kafka-es-indexer.yaml
+```
+
+A sample configuration file with complete documentation of all options is available at `rust/config/kafka-es-indexer.yaml`.
+
 ### Prometheus configuration
 
 ```properties
diff --git a/rust/README.md b/rust/README.md
@@ -8,6 +8,11 @@ Project crates:
  * opencue_proto: Wrapper around grpc's generated code for the project protobuf modules
  * kafka-es-indexer: Kafka to Elasticsearch indexer for OpenCue monitoring events
 
+Sample configuration files are available in the `config/` directory:
+ * `config/rqd.yaml` - RQD configuration
+ * `config/rqd.fake_linux.yaml` - RQD configuration for simulating Linux on macOS
+ * `config/kafka-es-indexer.yaml` - Kafka-Elasticsearch indexer configuration
+
 ## Build Instructions
 
 Follow these steps to build and run the Rust-based RQD and Dummy Cuebot modules.
diff --git a/rust/config/kafka-es-indexer.yaml b/rust/config/kafka-es-indexer.yaml
@@ -0,0 +1,132 @@
+# Kafka-Elasticsearch Indexer Configuration File
+#
+# This file configures the kafka-es-indexer service that consumes OpenCue
+# monitoring events from Kafka and indexes them into Elasticsearch for
+# historical analysis and querying.
+#
+# Data Flow: Cuebot (Producer) -> Kafka -> kafka-es-indexer (Consumer) -> Elasticsearch
+
+# =============================================================================
+# KAFKA CONFIGURATION
+# =============================================================================
+kafka:
+  # Kafka bootstrap servers (comma-separated list)
+  # Multiple brokers can be specified for high availability
+  # Default: localhost:9092
+  bootstrap_servers: "localhost:9092"
+
+  # Consumer group ID
+  # All indexer instances with the same group_id will share partition
+  # assignments and coordinate offset commits. Use a unique ID per cluster.
+  # Default: opencue-elasticsearch-indexer
+  group_id: "opencue-elasticsearch-indexer"
+
+  # What to do when there is no initial offset in Kafka
+  # Options:
+  #   earliest - Start from the oldest available message
+  #   latest   - Start from the newest message (skip historical)
+  # Default: earliest
+  auto_offset_reset: "earliest"
+
+  # Enable automatic offset commits
+  # When true, offsets are committed periodically based on auto_commit_interval_ms
+  # When false, offsets are committed manually after each message is processed
+  # Default: true
+  enable_auto_commit: true
+
+  # Interval between automatic offset commits (in milliseconds)
+  # Only used when enable_auto_commit is true
+  # Lower values reduce duplicate processing on restart but increase overhead
+  # Default: 5000 (5 seconds)
+  auto_commit_interval_ms: 5000
+
+  # Maximum number of records to fetch per poll
+  # Higher values improve throughput but increase memory usage
+  # Default: 500
+  max_poll_records: 500
+
+  # Kafka session timeout (in milliseconds)
+  # If the consumer doesn't send heartbeats within this interval,
+  # it will be removed from the consumer group and partitions will be rebalanced
+  # Default: 30000 (30 seconds)
+  session_timeout_ms: 30000
+
+  # Kafka topics to subscribe to
+  # These are the event topics published by Cuebot
+  # Default: all OpenCue event topics
+  topics:
+    - "opencue.job.events"
+    - "opencue.layer.events"
+    - "opencue.frame.events"
+    - "opencue.host.events"
+    - "opencue.proc.events"
+
+# =============================================================================
+# ELASTICSEARCH CONFIGURATION
+# =============================================================================
+elasticsearch:
+  # Elasticsearch URL
+  # Can be a single node or a load balancer in front of a cluster
+  # Default: http://localhost:9200
+  url: "http://localhost:9200"
+
+  # Username for Elasticsearch authentication (optional)
+  # Required when Elasticsearch has security features enabled
+  # Can also be set via ELASTICSEARCH_USERNAME environment variable
+  # username: "elastic"
+
+  # Password for Elasticsearch authentication (optional)
+  # Required when Elasticsearch has security features enabled
+  # Can also be set via ELASTICSEARCH_PASSWORD environment variable
+  # password: "changeme"
+
+  # Index name prefix for all OpenCue event indices
+  # Indices are created with pattern: {prefix}-{event-type}-{date}
+  # Example: opencue-frame-events-2024.11.29
+  # Default: opencue
+  index_prefix: "opencue"
+
+  # Number of primary shards for event indices
+  # More shards allow parallel indexing and searching
+  # For small deployments, 1 shard is sufficient
+  # For large deployments with many events, consider 3-5 shards
+  # Default: 1
+  num_shards: 1
+
+  # Number of replica shards for event indices
+  # Replicas provide redundancy and improve read throughput
+  # Set to 0 for development/testing, 1+ for production
+  # Default: 0
+  num_replicas: 0
+
+  # Maximum number of events to batch before sending to Elasticsearch
+  # Higher values improve throughput but increase latency and memory usage
+  # Events are also flushed based on flush_interval_ms
+  # Default: 100
+  bulk_size: 100
+
+  # Maximum time to wait before flushing events to Elasticsearch (in milliseconds)
+  # Events are flushed when either bulk_size is reached or this interval elapses
+  # Lower values reduce latency but increase indexing overhead
+  # Default: 5000 (5 seconds)
+  flush_interval_ms: 5000
+
+# =============================================================================
+# LOGGING CONFIGURATION
+# =============================================================================
+#
+# Logging is configured via the LOG_LEVEL environment variable or --log-level
+# CLI argument. This section documents the available options.
+#
+# Options: trace, debug, info, warn, error
+# Default: info
+#
+# Examples:
+#   - trace: Very verbose, includes all debug info (development only)
+#   - debug: Detailed info including each received event
+#   - info: Standard operation logs (recommended for production)
+#   - warn: Only warnings and errors
+#   - error: Only errors
+#
+# Can also use RUST_LOG env var for fine-grained control:
+#   RUST_LOG=kafka_es_indexer=debug,rdkafka=info
diff --git a/rust/crates/kafka-es-indexer/README.md b/rust/crates/kafka-es-indexer/README.md
@@ -50,28 +50,14 @@ kafka-es-indexer
 
 ### Configuration File
 
-```yaml
-# config.yaml
-kafka:
-  bootstrap_servers: "localhost:9092"
-  group_id: "opencue-elasticsearch-indexer"
-  auto_offset_reset: "earliest"
-  enable_auto_commit: true
-  auto_commit_interval_ms: 5000
-
-elasticsearch:
-  url: "http://localhost:9200"
-  index_prefix: "opencue"
-  num_shards: 1
-  num_replicas: 0
-  bulk_size: 100
-  flush_interval_ms: 5000
-```
+A sample configuration file with complete documentation is available at `rust/config/kafka-es-indexer.yaml`.
 
 ```bash
-kafka-es-indexer --config config.yaml
+kafka-es-indexer --config /path/to/kafka-es-indexer.yaml
 ```
 
+See the [sample config](../../config/kafka-es-indexer.yaml) for all available options and their descriptions.
+
 ## Docker
 
 Build the Docker image: