Skip to content

Run Alloy on Windows as hostProcess to capture Windows EKS EventLogs not possible #4037

@wouterkestevens

Description

@wouterkestevens

Capture EKS logs from Windows nodes

AWS managed Windows Nodes are storing the EKS (Kublet, KubeProxy, ...) logs not on disk but in the Windows EventLogs.

LogMonitor not possible

To capture this information the LogMonitor solution of Microsoft isn't working due that can't fetch the EKS log entries. The LogMonitor can only access the EventLogs in the running container. This means the application EventLogs.

Alloy (default) not sufficient

Alloy is better way of capturing Windows EventLogs and other node information. The default setup is able to capture Node Windows EventLogs and to send it to Grafana via Loki. But it cant access the EKS EventLog entry due lacking permissions. To do this the container needs to run as
NT AUTHORITY\System.

Alloy running as hostProcess

Running Alloy as hostProcess on Windows Nodes fixes the issue to capture all EventLogs and send that information towards Loki.

Values file for Windows:

...
controller:
  hostNetwork: true
...
alloy:
  listenScheme: "HTTP"
  securityContext:
    windowsOptions:
      hostProcess: true
      runAsUserName: NT AUTHORITY\System
...

Issues

2 issues that are on the output of the Helm chart.

  • First issue
    When the hostNetwork and hostProcess are set the Helm chart output is able to Alloy container isn't able to start. The path generated to start the Alloy.exe isn't correct. Containerd that is running the container on the Windows nodes behaves differently with the container is running as hostProcess. Containerd is creating a sandbox for the container, the system drive mount changes from c:/ to c:/hpc or from / to /hpc.

  • Second issue
    The readinessProbe isn't able to access the Alloy application to check if it is ready. This due that the routing isn't working with the pod ID. When the hostProcess is active you need to use localhost.

Mitigation

I mitigated the issue with a postRender Python script to change the alloy.exe startup command. Can this be fixed that the Helm chart renders this differently for Windows containers?

import sys, yaml

docs = yaml.safe_load_all(sys.stdin.read())
output = []

for doc in docs:
    if doc and doc.get("kind") == "DaemonSet":
        c = doc["spec"]["template"]["spec"]["containers"]
        for container in c:
            if container["name"] == "alloy":
                container["command"] = [
                    "/hpc/Program Files/GrafanaLabs/Alloy/alloy.exe"
                ]
                container["readinessProbe"]["httpGet"]["host"] = "localhost"

    output.append(doc)

yaml.safe_dump_all(output, sys.stdout)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions