Skip to content

Split Kubernetes into node-local and global agents #3113

@therc

Description

@therc

Given that:

  • Right now, the recommended installation is to have a dd-agent pod on each node, through a DaemonSet. That works reasonably well, save minor issues about regular apps discovering where the node-local statsd really is (I'm working to make this easier upstream).

  • But also, if you collect events, kubernetes.yaml.example tells you to only do that from a single agent in the cluster. How would one do that? It's not trivial to do with just a DaemonSet, unless you resort to a StatefulSet or some label-based hack, which is brittle and prone to fail in unexpected ways.

  • Last but not least, kubernetes_state.yaml.example does NOT tell you to only run the check from a single agent. Based on my experience, that causes only headaches. You get alerts from all the nodes in the cluster. I'm sure you folks aren't too thrilled about getting a lot of duplicate data, either.

I have a proposal to combine the above:

  1. Keep running regular statsd/agents as a DaemonSet.
  2. Recommend using Docker-based (i.e. based on image name) service discovery.The agent that lives on the same node where the kube-state-metrics pod is running will the lucky one to send kubernetes_state data. Make clear in the example that the check should only be run from one place in the cluster and that service discovery is the best way to do it. In other words, tell users that in most case configuring the check manually is not a good idea.
  3. Have a separate Deployment, with one replica, which only runs checks that are global in scope:

That's all very easy to implement, except for 3). The current check logic is:

  • Get the node's list of pods from the Kubelet
  • Perform Kubelet health checks
  • Fetch cAdvisor/Docker metrics about every pod
  • If event collection is enabled, call the API server and fetch them. Curiously enough, _process_events gets passed the list of pods, but it doesn't do anything with that.

The simplest fix is to have a new setting ("global checks"?), which prevents the check from talking to the Kubelet, Docker and cAdvisor. The new, alternate mode would only talk to the API servers and the rest of the control plane in order to collect events and control plane data.

Does it sound reasonable? Anything missing?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions