This centralized logging solution is designed to aggregate logs from multiple sources, including HPC cluster jobs for both the BioImage Archive (BIA) and EMPIAR. The setup ensures that logs from different applications and environments can be collected, processed, and visualized in a unified manner.
For details on how logs are forwarded from the HPC cluster, see the Fluent Bit setup guide.
This project is structured as a Helm chart (efk-stack). The Kubernetes resources are defined as templates in the templates/ directory and are configurable via the values.yaml file.
- values.yaml: The primary configuration file. Use this to toggle optional components (like test pods), configure image tags, and set service types or ports.
- templates/1_elasticsearch.yaml: Deploys a single-node Elasticsearch cluster (Deployment) and its associated Service.
- templates/2_fluentd.yaml: Deploys Fluentd as a DaemonSet. This template consolidates the ServiceAccount, ConfigMap (fluentd config), and Services (fluentd-http and fluentd-forward).
- templates/3_kibana.yaml: Deploys the Kibana Deployment and Service for log visualization.
- templates/4_counter-pod.yaml: (Optional) A simple pod that generates logs internally to test cluster-level logging. Enable this by setting
counterPod.enabled: truein values.yaml. - templates/5_bia-test-pod.yaml: (Optional) A specific test pod (bia-log-generator) designed to generate logs mimicking the BIA application structure. Enable this by setting
biaTestPod.enabled: truein values.yaml.
Before utilizing the automated deployment, the following configurations must be applied manually to the cluster.
Specific permissions for Fluentd (like ClusterRole for log reading) are handled separately via fluentd-rbac-manual.yaml by IT.
Once the prerequisites are met, deployment is fully automated via GitLab CI/CD.
The deployment pipeline is defined in .gitlab-ci.yml. When changes are pushed to the repository, the deploy-efk-stack job automatically upgrades the Helm release in the target namespace (logging-test).
Once the pipeline completes, verify that all pods (Elasticsearch, Kibana, Fluentd, and the counter) are up and running in the cluster.
kubectl get pods \-n logging-test
To visualize the logs, you need to access the Kibana UI. The service is exposed via a NodePort.
- Get the NodePort for the Kibana service:
kubectl get service kibana \-n logging-test
Look for the port mapping in the output (e.g., 5601:31234/TCP). The higher number is the NodePort.
- Get a node's IP address:
kubectl get nodes \-o wide
You can use any of the INTERNAL-IP addresses listed.
- Open Kibana in your browser by navigating to
http://<NODE_IP>:<NODE_PORT> - Once in Kibana, you can filter for logs from the test counter pod by using the Kibana Query Language (KQL) in the search bar. Search for
kubernetes.pod_name : "counter"to see its logs.
This step validates that Fluentd is collecting logs from namespaces other than logging-test.
-
Create the bia-log-test namespace.
kubectl create namespace bia-log-test -
Deploy a standalone test pod into the new namespace.
Note: We deploy this manually because the Helm chart deploys components to the main logging namespace.kubectl run bia-log-generator \\ \--image=busybox \\ \--namespace=bia-log-test \\ \--restart=Never \\ \-- /bin/sh \-c 'i=0; while true; do echo "BIA-LOG-TEST: $i"; i=$((i+1)); sleep 5; done'
The Fluentd configuration in 2_fluentd.yaml has been specifically adapted for environments where cluster-wide permissions are not available.
- RBAC: It uses a namespace-scoped Role and RoleBinding instead of a ClusterRole.
- Fluentd Configuration: The kubernetes_metadata filter is configured with watch false.
This has a critical implication: Fluentd only has permission to enrich logs with Kubernetes metadata for pods running within its own namespace (logging-test).
Logs from pods like the counter are fully enriched. You can use detailed KQL queries:
kubernetes.pod\_name : "counter"
Sample entry:
{
"\_index": "logstash-2025.11.14",
"\_type": "\_doc",
"\_id": "GqIqg5oBSuSMnNNQ7N\_v",
"\_version": 1,
"\_score": null,
"\_source": {
"log": "193087: Fri Nov 14 16:20:18 UTC 2025\\n",
"stream": "stdout",
"docker": {
"container\_id": "843ce906552d02715f0b02a9c978adb0c7707cbf854ca2b0b17556815b804715"
},
"kubernetes": {
"container\_name": "count",
"namespace\_name": "logging-test",
"pod\_name": "counter",
"container\_image": "busybox:latest",
"container\_image\_id": "docker-pullable://busybox@sha256:e3652a00a2fabd16ce889f0aa32c38eec347b997e73bd09e69c962ec7f8732ee",
"pod\_id": "5c7201ec-85d3-469c-981a-8bacbe8c3f18",
"pod\_ip": "10.251.2.127",
"host": "hh-rke-wp-webadmin-32-worker-1.caas.ebi.ac.uk",
"master\_url": "\[https://10.252.0.1:443/api\](https://10.252.0.1:443/api)",
"namespace\_id": "5713880b-f88b-4628-9a2f-9d0133b9a862",
"namespace\_labels": {
"kubernetes.io/metadata.name": "logging-test"
}
},
"@timestamp": "2025-11-14T16:20:18.642953828+00:00",
"tag": "kubernetes.var.log.containers.counter\_logging-test\_count-843ce906552d02715f0b02a9c978adb0c7707cbf854ca2b0b17556815b804715.log"
},
"fields": {
"@timestamp": \[
"2025-11-14T16:20:18.642Z"
\]
},
"highlight": {
"kubernetes.pod\_name": \[
"@kibana-highlighted-field@counter@/kibana-highlighted-field@"
\]
},
"sort": \[
1763137218642
\]
}
Fluentd will still collect logs from all other namespaces (e.g., bia, empiar), but it cannot add the kubernetes.* metadata fields. To find these logs, you must search for the raw log content or parts of the filename tag.
The Fluentd plugin has a fallback mechanism where it parses the log's filename to extract basic metadata. This means that for pods outside the logging-test namespace, you can often still filter by pod name.
For example, to find logs from the bia-log-generator pod, you can use:
kubernetes.pod\_name : "bia-log-generator"
Alternatively, searching for a unique string from the log output is also a reliable method:
log : "BIA-LOG-TEST"
See fluentbit folder for details.