-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Description
Capture EKS logs from Windows nodes
AWS managed Windows Nodes are storing the EKS (Kublet, KubeProxy, ...) logs not on disk but in the Windows EventLogs.
LogMonitor not possible
To capture this information the LogMonitor solution of Microsoft isn't working due that can't fetch the EKS log entries. The LogMonitor can only access the EventLogs in the running container. This means the application EventLogs.
Alloy (default) not sufficient
Alloy is better way of capturing Windows EventLogs and other node information. The default setup is able to capture Node Windows EventLogs and to send it to Grafana via Loki. But it cant access the EKS EventLog entry due lacking permissions. To do this the container needs to run as
NT AUTHORITY\System.
Alloy running as hostProcess
Running Alloy as hostProcess on Windows Nodes fixes the issue to capture all EventLogs and send that information towards Loki.
Values file for Windows:
...
controller:
hostNetwork: true
...
alloy:
listenScheme: "HTTP"
securityContext:
windowsOptions:
hostProcess: true
runAsUserName: NT AUTHORITY\System
...
Issues
2 issues that are on the output of the Helm chart.
-
First issue
When the hostNetwork and hostProcess are set the Helm chart output is able to Alloy container isn't able to start. The path generated to start the Alloy.exe isn't correct. Containerd that is running the container on the Windows nodes behaves differently with the container is running as hostProcess. Containerd is creating a sandbox for the container, the system drive mount changes fromc:/toc:/hpcor from/to/hpc. -
Second issue
ThereadinessProbeisn't able to access the Alloy application to check if it is ready. This due that the routing isn't working with the pod ID. When the hostProcess is active you need to uselocalhost.
Mitigation
I mitigated the issue with a postRender Python script to change the alloy.exe startup command. Can this be fixed that the Helm chart renders this differently for Windows containers?
import sys, yaml
docs = yaml.safe_load_all(sys.stdin.read())
output = []
for doc in docs:
if doc and doc.get("kind") == "DaemonSet":
c = doc["spec"]["template"]["spec"]["containers"]
for container in c:
if container["name"] == "alloy":
container["command"] = [
"/hpc/Program Files/GrafanaLabs/Alloy/alloy.exe"
]
container["readinessProbe"]["httpGet"]["host"] = "localhost"
output.append(doc)
yaml.safe_dump_all(output, sys.stdout)