Skip to content

Commit 737e495

Browse files
OTL-3707 Add documentation for aws batch node support
1 parent 8abb526 commit 737e495

4 files changed

Lines changed: 123 additions & 0 deletions

File tree

.chloggen/eks-batch-nodes.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
2+
change_type: enhancement
3+
# The name of the component, or a single word describing the area of concern, (e.g. agent, clusterReceiver, gateway, operator, chart, other)
4+
component: agent
5+
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
6+
note: Document how to run the Collector agent daemonset on AWS Batch-managed EKS nodes
7+
# One or more tracking issues related to the change
8+
issues: [2398]
9+
# (Optional) One or more lines of additional information to render under the primary note.
10+
# These lines will be padded with 2 spaces and then inserted directly into the document.
11+
# Use pipe (|) for multiline entries.
12+
subtext:

docs/advanced-configuration.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -348,6 +348,52 @@ aws eks create-pod-identity-association \
348348
--region $AWS_REGION
349349
````
350350
351+
## EKS: Running on AWS Batch nodes
352+
353+
AWS Batch on EKS taints its managed nodes with `batch.amazonaws.com/batch-node`
354+
to prevent general workloads from scheduling there. The Collector agent daemonset
355+
must tolerate this taint to collect logs and metrics from those nodes.
356+
This configuration is for EKS clusters that run the agent daemonset; it does not
357+
apply to `eks/fargate`.
358+
359+
The top-level [`tolerations`](../helm-charts/splunk-otel-collector/values.yaml)
360+
value controls the agent daemonset tolerations only (not the cluster receiver,
361+
gateway, or operator; each has its own `tolerations` sub-key). Because Helm
362+
replaces list values entirely on upgrade, your custom `tolerations` list must
363+
include **both** the chart's default tolerations and the new Batch entries:
364+
365+
```yaml
366+
# Set distribution to match your cluster type (eks or eks/auto-mode).
367+
distribution: eks
368+
cloudProvider: aws
369+
370+
tolerations:
371+
# Chart defaults - keep these to continue collecting from control-plane and
372+
# infra nodes.
373+
- key: node-role.kubernetes.io/master
374+
effect: NoSchedule
375+
operator: Exists
376+
- key: node-role.kubernetes.io/control-plane
377+
effect: NoSchedule
378+
operator: Exists
379+
- key: kubernetes.io/system-node
380+
effect: NoSchedule
381+
operator: Exists
382+
- key: node-role.kubernetes.io/infra
383+
effect: NoSchedule
384+
operator: Exists
385+
# AWS Batch node taint - allows scheduling on Batch-managed nodes.
386+
- key: batch.amazonaws.com/batch-node
387+
operator: Exists
388+
effect: NoSchedule
389+
- key: batch.amazonaws.com/batch-node
390+
operator: Exists
391+
effect: NoExecute
392+
```
393+
394+
See the [eks-batch-nodes example](../examples/eks-batch-nodes/README.md) for a
395+
ready-to-use values file.
396+
351397
## EKS Fargate support
352398

353399
If you want to run the Splunk OpenTelemetry Collector in [Amazon Elastic Kubernetes Service

examples/eks-batch-nodes/README.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Example: Collector on EKS with AWS Batch nodes
2+
3+
This example shows how to configure the Splunk OpenTelemetry Collector to
4+
collect logs and metrics from EKS nodes managed by
5+
[AWS Batch on EKS](https://docs.aws.amazon.com/batch/latest/userguide/jobs_eks.html).
6+
7+
## Background
8+
9+
AWS Batch taints its managed EKS nodes with `batch.amazonaws.com/batch-node` to
10+
prevent general workloads from scheduling there. The Collector agent daemonset
11+
must explicitly tolerate this taint to run on those nodes.
12+
13+
The chart's top-level `tolerations` value is a list. Helm replaces lists
14+
entirely during upgrades, so your values file must include **both** the chart's
15+
built-in default tolerations and the new AWS Batch entries. Omitting the
16+
defaults would stop the agent from scheduling on control-plane and infra nodes.
17+
18+
## Usage
19+
20+
```bash
21+
helm install my-splunk-otel-collector \
22+
--values eks-batch-nodes-values.norender.yaml \
23+
splunk-otel-collector-chart/splunk-otel-collector
24+
```
25+
26+
Replace the `CHANGEME` placeholders before running.
27+
28+
## See also
29+
30+
- [Advanced configuration - EKS: Running on AWS Batch nodes](../../docs/advanced-configuration.md#eks-running-on-aws-batch-nodes)
31+
- [Run a DaemonSet on AWS Batch managed nodes](https://docs.aws.amazon.com/batch/latest/userguide/daemonset-on-batch-eks-nodes.html)
32+
- [Kubernetes taints and tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/)
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
splunkObservability:
2+
realm: CHANGEME
3+
accessToken: CHANGEME
4+
5+
# The cluster name is auto-discovered for eks and eks/auto-mode.
6+
# Set to eks/auto-mode if using EKS Auto Mode.
7+
distribution: eks
8+
cloudProvider: aws
9+
10+
# Helm replaces list values entirely on upgrade, so include all tolerations you
11+
# need, both the chart defaults below and the AWS Batch additions at the end.
12+
tolerations:
13+
# Chart defaults - keep these to collect from control-plane and infra nodes.
14+
- key: node-role.kubernetes.io/master
15+
effect: NoSchedule
16+
operator: Exists
17+
- key: node-role.kubernetes.io/control-plane
18+
effect: NoSchedule
19+
operator: Exists
20+
- key: kubernetes.io/system-node
21+
effect: NoSchedule
22+
operator: Exists
23+
- key: node-role.kubernetes.io/infra
24+
effect: NoSchedule
25+
operator: Exists
26+
# AWS Batch node taint - allows the agent daemonset to schedule on
27+
# Batch-managed nodes.
28+
- key: batch.amazonaws.com/batch-node
29+
operator: Exists
30+
effect: NoSchedule
31+
- key: batch.amazonaws.com/batch-node
32+
operator: Exists
33+
effect: NoExecute

0 commit comments

Comments
 (0)