|
| 1 | +--- |
| 2 | +id: aggregate-cce-logs-with-grafana-alloy-and-grafana-loki |
| 3 | +title: Aggregate CCE Logs with Grafana Alloy & Grafana Loki |
| 4 | +tags: [cce, observability, logging, grafana, loki, alloy] |
| 5 | +sidebar_position: 4 |
| 6 | +--- |
| 7 | + |
| 8 | +# Aggregate CCE Logs with Grafana Alloy & Grafana Loki |
| 9 | + |
| 10 | +This blueprint explains how to collect and centralize logs from Cloud Container Engine (CCE) using [Grafana Alloy](https://grafana.com/docs/alloy/latest/) and [Grafana Loki](https://grafana.com/oss/loki/). It outlines the process of configuring Grafana Alloy as a unified telemetry collector within Kubernetes and integrating it with Grafana Loki for efficient storage and visualization. By the end, you will have a modern, future-proof, and scalable logging setup that simplifies monitoring, troubleshooting, and operational insights across your CCE workloads. |
| 11 | + |
| 12 | +## What is Grafana Alloy? |
| 13 | + |
| 14 | + |
| 15 | + |
| 16 | +Grafana Alloy is a flexible, high-performance, vendor-neutral telemetry Collector. It also replaces [Promtail](https://grafana.com/docs/loki/latest/send-data/promtail/) as the actively maintained log collection agent. Alloy is fully compatible with popular open source observability standards such as [OpenTelemetry](https://opentelemetry.io/) and [Prometheus](https://prometheus.io/), focusing on ease of use and the ability to adapt to the needs of power users. |
| 17 | + |
| 18 | +Unlike Promtail, which was designed solely for log collection, Alloy is a unified telemetry collector that natively supports all observability signals including logs, metrics, traces, and profiles. This "big tent" approach means you can deploy a single agent per node instead of managing multiple specialized collectors. |
| 19 | + |
| 20 | +Grafana Loki serves as a log aggregation system optimized for scalability, availability, and cost efficiency. Drawing inspiration from Prometheus, Loki indexes only metadata through labels rather than the log content itself. Loki groups log entries into streams and indexes them with labels, which reduces overall costs and the time between log entry ingestion and query availability. |
| 21 | + |
| 22 | +## Why Choose Grafana Alloy? |
| 23 | + |
| 24 | +Grafana Alloy represents the future of telemetry collection in the Grafana ecosystem. Its unified approach to collecting logs, metrics, traces, and profiles reduces operational complexity while providing enterprise-grade features like clustering, GitOps support, and advanced debugging capabilities. With [Promtail reaching end-of-life in March 2026](https://grafana.com/docs/loki/latest/send-data/promtail/), migrating to Alloy ensures your logging infrastructure remains supported and gains access to ongoing feature development. |
| 25 | + |
| 26 | +The component-based architecture provides flexibility to adapt to changing requirements without replacing the entire collector. Whether you're collecting simple container logs or building complex observability pipelines with multiple data sources and destinations, Alloy's extensibility and OpenTelemetry-native design future-proof your investment. |
| 27 | + |
| 28 | +## Installing Grafana Loki |
| 29 | + |
| 30 | +If you don't already have a Grafana Loki instance running, you can set it up first before proceeding with log aggregation. The installation process is covered in detail in the companion blueprint [Deploy Grafana Loki on CCE](/docs/blueprints/by-use-case/observability/deploy-grafana-loki-on-cce), which explains how to deploy Loki in microservices mode on Cloud Container Engine (CCE) with Open Telekom Cloud Object Storage (OBS) as the backend. Once Loki is up and running, you can continue here to install and configure Grafana Alloy and start collecting and centralizing logs from your CCE workloads. |
| 31 | + |
| 32 | +## Installing Grafana Alloy |
| 33 | + |
| 34 | +### Configuring Grafana Alloy for CCE Log Collection |
| 35 | + |
| 36 | +Create a ConfigMap for Alloy's configuration. This will be referenced in the Helm values file. |
| 37 | + |
| 38 | +```yaml title="alloy-configmap.yaml" |
| 39 | +apiVersion: v1 |
| 40 | +kind: ConfigMap |
| 41 | +metadata: |
| 42 | + name: alloy-logs-config |
| 43 | + namespace: monitoring |
| 44 | +data: |
| 45 | + config: | |
| 46 | + // Discover all pods in the cluster |
| 47 | + discovery.kubernetes "pods" { |
| 48 | + role = "pod" |
| 49 | + |
| 50 | + // Restrict to pods on the same node to reduce resource usage |
| 51 | + selectors { |
| 52 | + role = "pod" |
| 53 | + field = "spec.nodeName=" + coalesce(sys.env("HOSTNAME"), constants.hostname) |
| 54 | + } |
| 55 | +
|
| 56 | + // This attaches node metadata to pod targets |
| 57 | + attach_metadata { |
| 58 | + node = true |
| 59 | + } |
| 60 | + } |
| 61 | +
|
| 62 | +
|
| 63 | + // Relabel discovered pods and create file paths |
| 64 | + discovery.relabel "pod_logs" { |
| 65 | + targets = discovery.kubernetes.pods.targets |
| 66 | + |
| 67 | + // Extract namespace |
| 68 | + rule { |
| 69 | + source_labels = ["__meta_kubernetes_namespace"] |
| 70 | + action = "replace" |
| 71 | + target_label = "namespace" |
| 72 | + } |
| 73 | + |
| 74 | + // Extract pod name |
| 75 | + rule { |
| 76 | + source_labels = ["__meta_kubernetes_pod_name"] |
| 77 | + action = "replace" |
| 78 | + target_label = "pod" |
| 79 | + } |
| 80 | + |
| 81 | + // Extract container name |
| 82 | + rule { |
| 83 | + source_labels = ["__meta_kubernetes_pod_container_name"] |
| 84 | + action = "replace" |
| 85 | + target_label = "container" |
| 86 | + } |
| 87 | +
|
| 88 | + // Add region label from node |
| 89 | + rule { |
| 90 | + source_labels = ["__meta_kubernetes_node_label_topology_kubernetes_io_region"] |
| 91 | + target_label = "region" |
| 92 | + } |
| 93 | + |
| 94 | + // Add availability zone label from node |
| 95 | + rule { |
| 96 | + source_labels = ["__meta_kubernetes_node_label_topology_kubernetes_io_zone"] |
| 97 | + target_label = "zone" |
| 98 | + } |
| 99 | + |
| 100 | + // Create job label from namespace/container |
| 101 | + rule { |
| 102 | + source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_pod_container_name"] |
| 103 | + action = "replace" |
| 104 | + target_label = "job" |
| 105 | + separator = "/" |
| 106 | + replacement = "$1" |
| 107 | + } |
| 108 | + |
| 109 | + // Extract app label if it exists |
| 110 | + rule { |
| 111 | + source_labels = ["__meta_kubernetes_pod_label_app_kubernetes_io_name"] |
| 112 | + action = "replace" |
| 113 | + target_label = "app" |
| 114 | + } |
| 115 | + |
| 116 | + // Create file path for pod logs |
| 117 | + rule { |
| 118 | + source_labels = ["__meta_kubernetes_pod_uid", "__meta_kubernetes_pod_container_name"] |
| 119 | + action = "replace" |
| 120 | + target_label = "__path__" |
| 121 | + separator = "/" |
| 122 | + replacement = "/var/log/pods/*$1/*.log" |
| 123 | + } |
| 124 | + |
| 125 | + // Extract container runtime |
| 126 | + rule { |
| 127 | + source_labels = ["__meta_kubernetes_pod_container_id"] |
| 128 | + action = "replace" |
| 129 | + target_label = "tmp_container_runtime" |
| 130 | + regex = "^(\\w+):\\/\\/.+$" |
| 131 | + replacement = "$1" |
| 132 | + } |
| 133 | + |
| 134 | + // Drop pods with no container ID (not yet running) |
| 135 | + rule { |
| 136 | + source_labels = ["__meta_kubernetes_pod_container_id"] |
| 137 | + action = "drop" |
| 138 | + regex = "" |
| 139 | + } |
| 140 | + } |
| 141 | +
|
| 142 | +
|
| 143 | + // Match actual log files on disk |
| 144 | + local.file_match "pod_logs" { |
| 145 | + path_targets = discovery.relabel.pod_logs.output |
| 146 | + } |
| 147 | +
|
| 148 | +
|
| 149 | + // Read logs from files |
| 150 | + loki.source.file "pod_logs" { |
| 151 | + targets = local.file_match.pod_logs.targets |
| 152 | + forward_to = [loki.process.pod_logs.receiver] |
| 153 | + } |
| 154 | +
|
| 155 | +
|
| 156 | + // Process and enrich logs |
| 157 | + loki.process "pod_logs" { |
| 158 | + |
| 159 | + // Parse containerd/CRI-O logs |
| 160 | + stage.match { |
| 161 | + selector = "{tmp_container_runtime=\"containerd\"}" |
| 162 | + |
| 163 | + stage.cri {} |
| 164 | + |
| 165 | + // Extract stream label (stdout or stderr) |
| 166 | + stage.labels { |
| 167 | + values = { |
| 168 | + stream = "", |
| 169 | + } |
| 170 | + } |
| 171 | + } |
| 172 | + |
| 173 | + |
| 174 | + // Add static labels like cluster identifier |
| 175 | + stage.static_labels { |
| 176 | + values = { |
| 177 | + cluster = "production", |
| 178 | + } |
| 179 | + } |
| 180 | + |
| 181 | + // Drop temporary labels |
| 182 | + stage.label_drop { |
| 183 | + values = ["tmp_container_runtime", "filename"] |
| 184 | + } |
| 185 | + |
| 186 | + forward_to = [loki.write.default.receiver] |
| 187 | + } |
| 188 | +
|
| 189 | + // Write configuration - sends logs to Loki |
| 190 | + loki.write "default" { |
| 191 | + endpoint { |
| 192 | + url = "http://loki-gateway.monitoring.svc.cluster.local/loki/api/v1/push" |
| 193 | + } |
| 194 | + |
| 195 | + // External labels applied to all logs |
| 196 | + external_labels = { |
| 197 | + aggregator = "alloy", |
| 198 | + } |
| 199 | + } |
| 200 | +
|
| 201 | +``` |
| 202 | +
|
| 203 | +:::important |
| 204 | +This is a baseline configuration and you may need to adjust it based on your specific environment and requirements. |
| 205 | +::: |
| 206 | +
|
| 207 | +:::tip Node Metadata for Geographic Labels |
| 208 | +By setting `attach_metadata {node = true}`, Alloy attaches node-level metadata to pod targets, which enables extraction of **availability zone** and **region** labels in the subsequent relabeling rules. This is useful for multi-region deployments and debugging location-specific issues. |
| 209 | +::: |
| 210 | + |
| 211 | +Then apply the ConfigMap: |
| 212 | + |
| 213 | +```bash |
| 214 | +kubectl apply -f alloy-configmap.yaml |
| 215 | +``` |
| 216 | + |
| 217 | +### Understanding the Alloy Configuration |
| 218 | + |
| 219 | +The Alloy [configuration](https://grafana.com/docs/alloy/latest/reference/config-blocks/) uses a [component-based](https://grafana.com/docs/alloy/latest/reference/components/) approach where each component performs a specific task and forwards data to the next component in the pipeline. |
| 220 | + |
| 221 | +* **Discovery Components**: The `discovery.kubernetes` component discovers pods in the cluster, while `discovery.relabel` filters and labels the discovered targets. This is similar to Prometheus service discovery but integrated directly into the collector. |
| 222 | +* **Source Component**: The `loki.source` component reads log files from the discovered pod targets and forwards them to the processing stage. |
| 223 | +* **Processing Pipeline**: The `loki.process` component applies multiple stages to transform and enrich the logs. It parses CRI format, extracts JSON fields, handles timestamps, and creates labels. |
| 224 | +* **Write Component**: The `loki.write` component sends the processed logs to Loki with configurable batching, retry, and timeout settings. |
| 225 | + |
| 226 | +### Installing Grafana Alloy via Helm |
| 227 | + |
| 228 | +Now create a values file called **values-alloy.yaml**: |
| 229 | + |
| 230 | +```yaml title="values-alloy.yaml" |
| 231 | +alloy: |
| 232 | + # Use a ConfigMap for configuration |
| 233 | + configMap: |
| 234 | + create: false |
| 235 | + name: alloy-logs-config |
| 236 | + key: config |
| 237 | + |
| 238 | + # Should be disabled when using DaemonSet as controller |
| 239 | + clustering: |
| 240 | + enabled: false # Enable for high availability |
| 241 | + |
| 242 | + # Mount host paths for log collection |
| 243 | + mounts: |
| 244 | + # Mount /var/log for pod logs |
| 245 | + varlog: true |
| 246 | + # On CCE, containerd logs are under /var/lib/containerd/container_logs and should be mounted explicitly |
| 247 | + extra: |
| 248 | + - name: containerd-logs |
| 249 | + mountPath: /var/lib/containerd/container_logs |
| 250 | + readOnly: true |
| 251 | +
|
| 252 | + # Resource limits for production |
| 253 | + resources: |
| 254 | + limits: |
| 255 | + cpu: 1000m |
| 256 | + memory: 1Gi |
| 257 | + requests: |
| 258 | + cpu: 500m |
| 259 | + memory: 512Mi |
| 260 | + |
| 261 | + # Security context required for reading pod logs |
| 262 | + securityContext: |
| 263 | + privileged: true |
| 264 | + runAsUser: 0 |
| 265 | + runAsGroup: 0 |
| 266 | + fsGroup: 0 |
| 267 | + |
| 268 | + # Extra environment variables for selecting based on node name |
| 269 | + extraEnv: |
| 270 | + - name: HOSTNAME |
| 271 | + valueFrom: |
| 272 | + fieldRef: |
| 273 | + fieldPath: spec.nodeName |
| 274 | +
|
| 275 | +# Deploy as DaemonSet |
| 276 | +controller: |
| 277 | + type: 'daemonset' |
| 278 | + volumes: |
| 279 | + extra: |
| 280 | + - name: containerd-logs |
| 281 | + hostPath: |
| 282 | + path: /var/lib/containerd/container_logs |
| 283 | + |
| 284 | + # Update strategy |
| 285 | + updateStrategy: |
| 286 | + type: RollingUpdate |
| 287 | + rollingUpdate: |
| 288 | + maxUnavailable: 1 |
| 289 | + |
| 290 | +# Service account settings |
| 291 | +serviceAccount: |
| 292 | + create: true |
| 293 | +
|
| 294 | +# RBAC permissions |
| 295 | +rbac: |
| 296 | + create: true |
| 297 | +``` |
| 298 | + |
| 299 | +:::note Log Collection Methods |
| 300 | +Alloy supports collecting logs through the Kubernetes API server instead of mounting host paths. This approach doesn't require privileged security contexts and can be useful for development or environments with strict security policies. However, for production systems, directly mounting log directories is recommended as it provides better performance by removing the log request load from the Kubernetes API server. |
| 301 | +::: |
| 302 | + |
| 303 | +:::danger CCE Containerd Log Path |
| 304 | +On CCE, containerd stores container logs at `/var/lib/containerd/container_logs`. The standard `/var/log/pods` path is a symbolic link to that path. You must explicitly mount this directory in your Alloy DaemonSet configuration (as shown in the `mounts.extra` section above) to ensure all container logs are collected. Without this mount, logs from containerd-based containers will not be accessible to Alloy. |
| 305 | +::: |
| 306 | + |
| 307 | +Deploy Grafana Alloy via Helm: |
| 308 | + |
| 309 | +```bash |
| 310 | +helm repo add grafana https://grafana.github.io/helm-charts |
| 311 | +helm repo update |
| 312 | +
|
| 313 | +helm upgrade --install alloy grafana/alloy \ |
| 314 | + -f values-alloy.yaml \ |
| 315 | + -n monitoring --create-namespace \ |
| 316 | + --reset-values |
| 317 | +``` |
| 318 | + |
| 319 | +### Verifying the Installation |
| 320 | + |
| 321 | +After deploying Alloy, verify that it's collecting and forwarding logs correctly. |
| 322 | + |
| 323 | +First, check that all Alloy pods are running: |
| 324 | + |
| 325 | +```bash |
| 326 | +kubectl get pods -n monitoring -l app.kubernetes.io/name=alloy |
| 327 | +``` |
| 328 | + |
| 329 | +All pods should show a **Running** status. Next, access the Alloy UI by port-forwarding to one of the pods: |
| 330 | + |
| 331 | +```bash |
| 332 | +kubectl port-forward -n monitoring daemonset/alloy 12345:12345 |
| 333 | +``` |
| 334 | + |
| 335 | +Open your browser and navigate to `http://localhost:12345`. In the Alloy UI: |
| 336 | + |
| 337 | +1. Click on **Graph** to view the component pipeline visualization |
| 338 | + |
| 339 | +  |
| 340 | + |
| 341 | +2. Click on **Alloy logo** to get list of defined components and verify that all components show a green status indicator |
| 342 | + |
| 343 | +  |
| 344 | + |
| 345 | +3. Click on the `loki.source.file` component to see active targets and log files being read |
| 346 | + |
| 347 | + To confirm logs are arriving in Loki, navigate to **Grafana** and run a simple query in **Explore** or view the **Drilldown** section: |
| 348 | + |
| 349 | +  |
| 350 | + |
| 351 | + You should see logs from pods. If logs appear with labels like `pod`, `namespace`, `container`, `region`, and `zone`, your Alloy configuration is working correctly. If no logs appear, check the Alloy component details for error messages and verify that the Loki endpoint URL is correct in your configuration. |
0 commit comments