Skip to content

Commit 8a2f426

Browse files
authored
285 aggregating cce logs with grafana alloy (#330)
1 parent aba44e2 commit 8a2f426

File tree

8 files changed

+488
-3
lines changed

8 files changed

+488
-3
lines changed
Lines changed: 351 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,351 @@
1+
---
2+
id: aggregate-cce-logs-with-grafana-alloy-and-grafana-loki
3+
title: Aggregate CCE Logs with Grafana Alloy & Grafana Loki
4+
tags: [cce, observability, logging, grafana, loki, alloy]
5+
sidebar_position: 4
6+
---
7+
8+
# Aggregate CCE Logs with Grafana Alloy & Grafana Loki
9+
10+
This blueprint explains how to collect and centralize logs from Cloud Container Engine (CCE) using [Grafana Alloy](https://grafana.com/docs/alloy/latest/) and [Grafana Loki](https://grafana.com/oss/loki/). It outlines the process of configuring Grafana Alloy as a unified telemetry collector within Kubernetes and integrating it with Grafana Loki for efficient storage and visualization. By the end, you will have a modern, future-proof, and scalable logging setup that simplifies monitoring, troubleshooting, and operational insights across your CCE workloads.
11+
12+
## What is Grafana Alloy?
13+
14+
![image](/img/docs/blueprints/by-use-case/observability/kubernetes-logging-with-loki/grfana-alloy-overview.png)
15+
16+
Grafana Alloy is a flexible, high-performance, vendor-neutral telemetry Collector. It also replaces [Promtail](https://grafana.com/docs/loki/latest/send-data/promtail/) as the actively maintained log collection agent. Alloy is fully compatible with popular open source observability standards such as [OpenTelemetry](https://opentelemetry.io/) and [Prometheus](https://prometheus.io/), focusing on ease of use and the ability to adapt to the needs of power users.
17+
18+
Unlike Promtail, which was designed solely for log collection, Alloy is a unified telemetry collector that natively supports all observability signals including logs, metrics, traces, and profiles. This "big tent" approach means you can deploy a single agent per node instead of managing multiple specialized collectors.
19+
20+
Grafana Loki serves as a log aggregation system optimized for scalability, availability, and cost efficiency. Drawing inspiration from Prometheus, Loki indexes only metadata through labels rather than the log content itself. Loki groups log entries into streams and indexes them with labels, which reduces overall costs and the time between log entry ingestion and query availability.
21+
22+
## Why Choose Grafana Alloy?
23+
24+
Grafana Alloy represents the future of telemetry collection in the Grafana ecosystem. Its unified approach to collecting logs, metrics, traces, and profiles reduces operational complexity while providing enterprise-grade features like clustering, GitOps support, and advanced debugging capabilities. With [Promtail reaching end-of-life in March 2026](https://grafana.com/docs/loki/latest/send-data/promtail/), migrating to Alloy ensures your logging infrastructure remains supported and gains access to ongoing feature development.
25+
26+
The component-based architecture provides flexibility to adapt to changing requirements without replacing the entire collector. Whether you're collecting simple container logs or building complex observability pipelines with multiple data sources and destinations, Alloy's extensibility and OpenTelemetry-native design future-proof your investment.
27+
28+
## Installing Grafana Loki
29+
30+
If you don't already have a Grafana Loki instance running, you can set it up first before proceeding with log aggregation. The installation process is covered in detail in the companion blueprint [Deploy Grafana Loki on CCE](/docs/blueprints/by-use-case/observability/deploy-grafana-loki-on-cce), which explains how to deploy Loki in microservices mode on Cloud Container Engine (CCE) with Open Telekom Cloud Object Storage (OBS) as the backend. Once Loki is up and running, you can continue here to install and configure Grafana Alloy and start collecting and centralizing logs from your CCE workloads.
31+
32+
## Installing Grafana Alloy
33+
34+
### Configuring Grafana Alloy for CCE Log Collection
35+
36+
Create a ConfigMap for Alloy's configuration. This will be referenced in the Helm values file.
37+
38+
```yaml title="alloy-configmap.yaml"
39+
apiVersion: v1
40+
kind: ConfigMap
41+
metadata:
42+
name: alloy-logs-config
43+
namespace: monitoring
44+
data:
45+
config: |
46+
// Discover all pods in the cluster
47+
discovery.kubernetes "pods" {
48+
role = "pod"
49+
50+
// Restrict to pods on the same node to reduce resource usage
51+
selectors {
52+
role = "pod"
53+
field = "spec.nodeName=" + coalesce(sys.env("HOSTNAME"), constants.hostname)
54+
}
55+
56+
// This attaches node metadata to pod targets
57+
attach_metadata {
58+
node = true
59+
}
60+
}
61+
62+
63+
// Relabel discovered pods and create file paths
64+
discovery.relabel "pod_logs" {
65+
targets = discovery.kubernetes.pods.targets
66+
67+
// Extract namespace
68+
rule {
69+
source_labels = ["__meta_kubernetes_namespace"]
70+
action = "replace"
71+
target_label = "namespace"
72+
}
73+
74+
// Extract pod name
75+
rule {
76+
source_labels = ["__meta_kubernetes_pod_name"]
77+
action = "replace"
78+
target_label = "pod"
79+
}
80+
81+
// Extract container name
82+
rule {
83+
source_labels = ["__meta_kubernetes_pod_container_name"]
84+
action = "replace"
85+
target_label = "container"
86+
}
87+
88+
// Add region label from node
89+
rule {
90+
source_labels = ["__meta_kubernetes_node_label_topology_kubernetes_io_region"]
91+
target_label = "region"
92+
}
93+
94+
// Add availability zone label from node
95+
rule {
96+
source_labels = ["__meta_kubernetes_node_label_topology_kubernetes_io_zone"]
97+
target_label = "zone"
98+
}
99+
100+
// Create job label from namespace/container
101+
rule {
102+
source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_pod_container_name"]
103+
action = "replace"
104+
target_label = "job"
105+
separator = "/"
106+
replacement = "$1"
107+
}
108+
109+
// Extract app label if it exists
110+
rule {
111+
source_labels = ["__meta_kubernetes_pod_label_app_kubernetes_io_name"]
112+
action = "replace"
113+
target_label = "app"
114+
}
115+
116+
// Create file path for pod logs
117+
rule {
118+
source_labels = ["__meta_kubernetes_pod_uid", "__meta_kubernetes_pod_container_name"]
119+
action = "replace"
120+
target_label = "__path__"
121+
separator = "/"
122+
replacement = "/var/log/pods/*$1/*.log"
123+
}
124+
125+
// Extract container runtime
126+
rule {
127+
source_labels = ["__meta_kubernetes_pod_container_id"]
128+
action = "replace"
129+
target_label = "tmp_container_runtime"
130+
regex = "^(\\w+):\\/\\/.+$"
131+
replacement = "$1"
132+
}
133+
134+
// Drop pods with no container ID (not yet running)
135+
rule {
136+
source_labels = ["__meta_kubernetes_pod_container_id"]
137+
action = "drop"
138+
regex = ""
139+
}
140+
}
141+
142+
143+
// Match actual log files on disk
144+
local.file_match "pod_logs" {
145+
path_targets = discovery.relabel.pod_logs.output
146+
}
147+
148+
149+
// Read logs from files
150+
loki.source.file "pod_logs" {
151+
targets = local.file_match.pod_logs.targets
152+
forward_to = [loki.process.pod_logs.receiver]
153+
}
154+
155+
156+
// Process and enrich logs
157+
loki.process "pod_logs" {
158+
159+
// Parse containerd/CRI-O logs
160+
stage.match {
161+
selector = "{tmp_container_runtime=\"containerd\"}"
162+
163+
stage.cri {}
164+
165+
// Extract stream label (stdout or stderr)
166+
stage.labels {
167+
values = {
168+
stream = "",
169+
}
170+
}
171+
}
172+
173+
174+
// Add static labels like cluster identifier
175+
stage.static_labels {
176+
values = {
177+
cluster = "production",
178+
}
179+
}
180+
181+
// Drop temporary labels
182+
stage.label_drop {
183+
values = ["tmp_container_runtime", "filename"]
184+
}
185+
186+
forward_to = [loki.write.default.receiver]
187+
}
188+
189+
// Write configuration - sends logs to Loki
190+
loki.write "default" {
191+
endpoint {
192+
url = "http://loki-gateway.monitoring.svc.cluster.local/loki/api/v1/push"
193+
}
194+
195+
// External labels applied to all logs
196+
external_labels = {
197+
aggregator = "alloy",
198+
}
199+
}
200+
201+
```
202+
203+
:::important
204+
This is a baseline configuration and you may need to adjust it based on your specific environment and requirements.
205+
:::
206+
207+
:::tip Node Metadata for Geographic Labels
208+
By setting `attach_metadata {node = true}`, Alloy attaches node-level metadata to pod targets, which enables extraction of **availability zone** and **region** labels in the subsequent relabeling rules. This is useful for multi-region deployments and debugging location-specific issues.
209+
:::
210+
211+
Then apply the ConfigMap:
212+
213+
```bash
214+
kubectl apply -f alloy-configmap.yaml
215+
```
216+
217+
### Understanding the Alloy Configuration
218+
219+
The Alloy [configuration](https://grafana.com/docs/alloy/latest/reference/config-blocks/) uses a [component-based](https://grafana.com/docs/alloy/latest/reference/components/) approach where each component performs a specific task and forwards data to the next component in the pipeline.
220+
221+
* **Discovery Components**: The `discovery.kubernetes` component discovers pods in the cluster, while `discovery.relabel` filters and labels the discovered targets. This is similar to Prometheus service discovery but integrated directly into the collector.
222+
* **Source Component**: The `loki.source` component reads log files from the discovered pod targets and forwards them to the processing stage.
223+
* **Processing Pipeline**: The `loki.process` component applies multiple stages to transform and enrich the logs. It parses CRI format, extracts JSON fields, handles timestamps, and creates labels.
224+
* **Write Component**: The `loki.write` component sends the processed logs to Loki with configurable batching, retry, and timeout settings.
225+
226+
### Installing Grafana Alloy via Helm
227+
228+
Now create a values file called **values-alloy.yaml**:
229+
230+
```yaml title="values-alloy.yaml"
231+
alloy:
232+
# Use a ConfigMap for configuration
233+
configMap:
234+
create: false
235+
name: alloy-logs-config
236+
key: config
237+
238+
# Should be disabled when using DaemonSet as controller
239+
clustering:
240+
enabled: false # Enable for high availability
241+
242+
# Mount host paths for log collection
243+
mounts:
244+
# Mount /var/log for pod logs
245+
varlog: true
246+
# On CCE, containerd logs are under /var/lib/containerd/container_logs and should be mounted explicitly
247+
extra:
248+
- name: containerd-logs
249+
mountPath: /var/lib/containerd/container_logs
250+
readOnly: true
251+
252+
# Resource limits for production
253+
resources:
254+
limits:
255+
cpu: 1000m
256+
memory: 1Gi
257+
requests:
258+
cpu: 500m
259+
memory: 512Mi
260+
261+
# Security context required for reading pod logs
262+
securityContext:
263+
privileged: true
264+
runAsUser: 0
265+
runAsGroup: 0
266+
fsGroup: 0
267+
268+
# Extra environment variables for selecting based on node name
269+
extraEnv:
270+
- name: HOSTNAME
271+
valueFrom:
272+
fieldRef:
273+
fieldPath: spec.nodeName
274+
275+
# Deploy as DaemonSet
276+
controller:
277+
type: 'daemonset'
278+
volumes:
279+
extra:
280+
- name: containerd-logs
281+
hostPath:
282+
path: /var/lib/containerd/container_logs
283+
284+
# Update strategy
285+
updateStrategy:
286+
type: RollingUpdate
287+
rollingUpdate:
288+
maxUnavailable: 1
289+
290+
# Service account settings
291+
serviceAccount:
292+
create: true
293+
294+
# RBAC permissions
295+
rbac:
296+
create: true
297+
```
298+
299+
:::note Log Collection Methods
300+
Alloy supports collecting logs through the Kubernetes API server instead of mounting host paths. This approach doesn't require privileged security contexts and can be useful for development or environments with strict security policies. However, for production systems, directly mounting log directories is recommended as it provides better performance by removing the log request load from the Kubernetes API server.
301+
:::
302+
303+
:::danger CCE Containerd Log Path
304+
On CCE, containerd stores container logs at `/var/lib/containerd/container_logs`. The standard `/var/log/pods` path is a symbolic link to that path. You must explicitly mount this directory in your Alloy DaemonSet configuration (as shown in the `mounts.extra` section above) to ensure all container logs are collected. Without this mount, logs from containerd-based containers will not be accessible to Alloy.
305+
:::
306+
307+
Deploy Grafana Alloy via Helm:
308+
309+
```bash
310+
helm repo add grafana https://grafana.github.io/helm-charts
311+
helm repo update
312+
313+
helm upgrade --install alloy grafana/alloy \
314+
-f values-alloy.yaml \
315+
-n monitoring --create-namespace \
316+
--reset-values
317+
```
318+
319+
### Verifying the Installation
320+
321+
After deploying Alloy, verify that it's collecting and forwarding logs correctly.
322+
323+
First, check that all Alloy pods are running:
324+
325+
```bash
326+
kubectl get pods -n monitoring -l app.kubernetes.io/name=alloy
327+
```
328+
329+
All pods should show a **Running** status. Next, access the Alloy UI by port-forwarding to one of the pods:
330+
331+
```bash
332+
kubectl port-forward -n monitoring daemonset/alloy 12345:12345
333+
```
334+
335+
Open your browser and navigate to `http://localhost:12345`. In the Alloy UI:
336+
337+
1. Click on **Graph** to view the component pipeline visualization
338+
339+
![image](/img/docs/blueprints/by-use-case/observability/kubernetes-logging-with-loki/grafana-alloy-dashboard-graph.png)
340+
341+
2. Click on **Alloy logo** to get list of defined components and verify that all components show a green status indicator
342+
343+
![image](/img/docs/blueprints/by-use-case/observability/kubernetes-logging-with-loki/grafana-alloy-dashboard-status.png)
344+
345+
3. Click on the `loki.source.file` component to see active targets and log files being read
346+
347+
To confirm logs are arriving in Loki, navigate to **Grafana** and run a simple query in **Explore** or view the **Drilldown** section:
348+
349+
![image](/img/docs/blueprints/by-use-case/observability/kubernetes-logging-with-loki/grafana-alloy-grafana-dashboard.png)
350+
351+
You should see logs from pods. If logs appear with labels like `pod`, `namespace`, `container`, `region`, and `zone`, your Alloy configuration is working correctly. If no logs appear, check the Alloy component details for error messages and verify that the Loki endpoint URL is correct in your configuration.

docs/blueprints/by-use-case/observability/aggregate-cce-logs-with-promtail-and-grafana-loki.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ helm repo update
139139

140140
helm upgrade --install promtail grafana/promtail \
141141
-f values-promtail.yaml \
142-
-n monitoring --create-namespace
142+
-n monitoring --create-namespace \
143143
--reset-values
144144
```
145145

docs/blueprints/by-use-case/observability/deploy-grafana-loki-on-cce.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -94,8 +94,8 @@ In this blueprint, Loki will be deployed on Cloud Container Engine (CCE) in micr
9494
object_store: s3
9595
schema: v13
9696
index:
97-
prefix: loki_index_
98-
period: 24h
97+
prefix: loki_index_
98+
period: 24h
9999
100100
storage:
101101
type: s3

0 commit comments

Comments
 (0)