Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .chloggen/eks-batch-nodes.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement
# The name of the component, or a single word describing the area of concern, (e.g. agent, clusterReceiver, gateway, operator, chart, other)
component: agent
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Document how to run the Collector agent daemonset on AWS Batch-managed EKS nodes
# One or more tracking issues related to the change
issues: [2398]
# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
46 changes: 46 additions & 0 deletions docs/advanced-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,52 @@ aws eks create-pod-identity-association \
--region $AWS_REGION
````

## EKS: Running on AWS Batch nodes

AWS Batch on EKS taints its managed nodes with `batch.amazonaws.com/batch-node`
to prevent general workloads from scheduling there. The Collector agent daemonset
must tolerate this taint to collect logs and metrics from those nodes.
This configuration is for EKS clusters that run the agent daemonset; it does not
apply to `eks/fargate`.

The top-level [`tolerations`](../helm-charts/splunk-otel-collector/values.yaml)
value controls the agent daemonset tolerations only (not the cluster receiver,
gateway, or operator; each has its own `tolerations` sub-key). Because Helm
replaces list values entirely on upgrade, your custom `tolerations` list must
include **both** the chart's default tolerations and the new Batch entries:

```yaml
# Set distribution to match your cluster type (eks or eks/auto-mode).
distribution: eks
cloudProvider: aws

tolerations:
# Chart defaults - keep these to continue collecting from control-plane and
# infra nodes.
- key: node-role.kubernetes.io/master
effect: NoSchedule
operator: Exists
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
operator: Exists
- key: kubernetes.io/system-node
effect: NoSchedule
operator: Exists
- key: node-role.kubernetes.io/infra
effect: NoSchedule
operator: Exists
# AWS Batch node taint - allows scheduling on Batch-managed nodes.
- key: batch.amazonaws.com/batch-node
operator: Exists
effect: NoSchedule
- key: batch.amazonaws.com/batch-node
operator: Exists
effect: NoExecute
```

See the [eks-batch-nodes example](../examples/eks-batch-nodes/README.md) for a
ready-to-use values file.

## EKS Fargate support

If you want to run the Splunk OpenTelemetry Collector in [Amazon Elastic Kubernetes Service
Expand Down
32 changes: 32 additions & 0 deletions examples/eks-batch-nodes/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Example: Collector on EKS with AWS Batch nodes

This example shows how to configure the Splunk OpenTelemetry Collector to
collect logs and metrics from EKS nodes managed by
[AWS Batch on EKS](https://docs.aws.amazon.com/batch/latest/userguide/jobs_eks.html).

## Background

AWS Batch taints its managed EKS nodes with `batch.amazonaws.com/batch-node` to
prevent general workloads from scheduling there. The Collector agent daemonset
must explicitly tolerate this taint to run on those nodes.

The chart's top-level `tolerations` value is a list. Helm replaces lists
entirely during upgrades, so your values file must include **both** the chart's
built-in default tolerations and the new AWS Batch entries. Omitting the
defaults would stop the agent from scheduling on control-plane and infra nodes.

## Usage

```bash
helm install my-splunk-otel-collector \
--values eks-batch-nodes-values.norender.yaml \
splunk-otel-collector-chart/splunk-otel-collector
```

Replace the `CHANGEME` placeholders before running.

## See also

- [Advanced configuration - EKS: Running on AWS Batch nodes](../../docs/advanced-configuration.md#eks-running-on-aws-batch-nodes)
- [Run a DaemonSet on AWS Batch managed nodes](https://docs.aws.amazon.com/batch/latest/userguide/daemonset-on-batch-eks-nodes.html)
- [Kubernetes taints and tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/)
33 changes: 33 additions & 0 deletions examples/eks-batch-nodes/eks-batch-nodes-values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
splunkObservability:
realm: CHANGEME
accessToken: CHANGEME

# The cluster name is auto-discovered for eks and eks/auto-mode.
# Set to eks/auto-mode if using EKS Auto Mode.
distribution: eks
cloudProvider: aws

# Helm replaces list values entirely on upgrade, so include all tolerations you
# need, both the chart defaults below and the AWS Batch additions at the end.
tolerations:
# Chart defaults - keep these to collect from control-plane and infra nodes.
- key: node-role.kubernetes.io/master
effect: NoSchedule
operator: Exists
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
operator: Exists
- key: kubernetes.io/system-node
effect: NoSchedule
operator: Exists
- key: node-role.kubernetes.io/infra
effect: NoSchedule
operator: Exists
# AWS Batch node taint - allows the agent daemonset to schedule on
# Batch-managed nodes.
- key: batch.amazonaws.com/batch-node
operator: Exists
effect: NoSchedule
- key: batch.amazonaws.com/batch-node
operator: Exists
effect: NoExecute
99 changes: 99 additions & 0 deletions examples/eks-batch-nodes/rendered_manifests/clusterRole.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
# Source: splunk-otel-collector/templates/clusterRole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: default-splunk-otel-collector
labels:
app.kubernetes.io/name: splunk-otel-collector
helm.sh/chart: splunk-otel-collector-0.150.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/instance: default
app.kubernetes.io/version: "0.150.0"
app: splunk-otel-collector
chart: splunk-otel-collector-0.150.0
release: default
rules:
- apiGroups:
- ""
resources:
- events
- namespaces
- namespaces/status
- nodes
- nodes/spec
- nodes/stats
- nodes/proxy
- pods
- pods/status
- persistentvolumeclaims
- persistentvolumes
- replicationcontrollers
- replicationcontrollers/status
- resourcequotas
- services
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- daemonsets
- deployments
- replicasets
- statefulsets
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- daemonsets
- deployments
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- batch
resources:
- jobs
- cronjobs
verbs:
- get
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- get
- list
- watch
- nonResourceURLs:
- /metrics
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
resourceNames:
- aws-auth
- apiGroups:
- events.k8s.io
resources:
- events
- namespaces
verbs:
- get
- list
- watch
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
# Source: splunk-otel-collector/templates/clusterRoleBinding.yaml
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: default-splunk-otel-collector
labels:
app.kubernetes.io/name: splunk-otel-collector
helm.sh/chart: splunk-otel-collector-0.150.0
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/instance: default
app.kubernetes.io/version: "0.150.0"
app: splunk-otel-collector
chart: splunk-otel-collector-0.150.0
release: default
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: default-splunk-otel-collector
subjects:
- kind: ServiceAccount
name: default-splunk-otel-collector
namespace: default
Loading
Loading