Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .chloggen/fix-non-eks-aws-ec2-detector.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: bug_fix
# The name of the component, or a single word describing the area of concern, (e.g. agent, clusterReceiver, gateway, operator, chart, other)
component: agent, clusterReceiver, gateway
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Fix resource detection for non-EKS Kubernetes clusters on AWS
# One or more tracking issues related to the change
issues: [2330]
# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
Two issues were fixed for non-EKS Kubernetes clusters running on AWS (cloudProvider=aws):
1. The chart was using the eks detector, which only activates on actual EKS clusters
(it checks for EKS-specific signals like IRSA/Pod Identity token paths, the OIDC
issuer, and the cluster version string). On non-EKS clusters these checks all fail
and the detector returns empty, so EC2 metadata (host.id, cloud.account.id,
cloud.availability_zone, etc.) was never collected. Switched to the ec2 detector,
which works on any AWS instance with IMDS access.
2. The condition for detecting non-EKS AWS clusters only matched when distribution was
unset (empty string), excluding the OpenShift distribution that can run on AWS
but is not EKS. Broadened the condition to match any non-EKS distribution on AWS.
21 changes: 21 additions & 0 deletions .chloggen/openshift-resource-detection.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement
# The name of the component, or a single word describing the area of concern, (e.g. agent, clusterReceiver, gateway, operator, chart, other)
component: agent, clusterReceiver, gateway
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add OpenShift resource detection support
# One or more tracking issues related to the change
issues: [2330]
# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
When distribution is set to "openshift", the openshift detector is enabled regardless of
cloudProvider. It auto-discovers k8s.cluster.name, cloud.provider, cloud.platform, and
cloud.region via the OpenShift API (config.openshift.io/v1/infrastructures). The
cloud.platform value reflects the actual hosting environment (e.g., aws_openshift,
gcp_openshift, azure.openshift). Cloud-provider-specific detectors are also enabled
alongside the openshift detector to collect instance metadata: ec2 on AWS, gcp on GCP,
and azure on Azure. Distribution-specific detectors (openshift, aks) now take priority
over the cloud-provider catch-all in detector selection, ensuring the correct
cloud.platform is always set. The clusterName field is now optional for OpenShift.
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,15 @@ rules:
- get
- list
- watch
- apiGroups:
- config.openshift.io
resources:
- infrastructures
- infrastructures/status
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ data:
resourcedetection:
detectors:
- env
- openshift
- system
override: true
timeout: 15s
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ data:
resourcedetection:
detectors:
- env
- openshift
- system
override: true
timeout: 15s
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ spec:
component: otel-collector-agent
release: default
annotations:
checksum/config: 86b9e237230bff778e5a9261f569f61444c70c0b8bbca028a6309255f08670a2
checksum/config: dc83bd8c776d052e99a56059e662b006ea49495536000ae73aed876f0f80157d
kubectl.kubernetes.io/default-container: otel-collector
spec:
hostNetwork: true
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ spec:
component: otel-k8s-cluster-receiver
release: default
annotations:
checksum/config: 5763ea975670663f85a27d6986d38a869f1863c466dc6e8f4c2f2123074fff45
checksum/config: 51aef421dc69c2e928b632dc6fad058e1003ee654fbdf24425661e5ad2ac5af2
spec:
serviceAccountName: default-splunk-otel-collector
nodeSelector:
Expand Down
4 changes: 2 additions & 2 deletions functional_tests/functional/functional_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -676,8 +676,8 @@ func validateResourceAttributes(t *testing.T, clientset *kubernetes.Clientset, k

internal.CopyFileFromPod(t, clientset, kubeConfig, internal.DefaultNamespace, podName, "otel-collector", podPathFile, tmpFile.Name())

actualResourceAttributes := readAndNormalizeMetrics(t, tmpFile.Name(), "k8s.cluster.name").ResourceMetrics().At(0).Resource().Attributes()
expectedResourceAttributes := readAndNormalizeMetrics(t, expectedResourceAttributesFile, "k8s.cluster.name").ResourceMetrics().At(0).Resource().Attributes()
actualResourceAttributes := readAndNormalizeMetrics(t, tmpFile.Name(), "k8s.cluster.name", "cloud.platform").ResourceMetrics().At(0).Resource().Attributes()
expectedResourceAttributes := readAndNormalizeMetrics(t, expectedResourceAttributesFile, "k8s.cluster.name", "cloud.platform").ResourceMetrics().At(0).Resource().Attributes()

require.True(t, expectedResourceAttributes.Equal(actualResourceAttributes), "Resource Attributes comparison failed for %s , expected values %s , actual values %s", collectorType, internal.FormatAttributes(expectedResourceAttributes), internal.FormatAttributes(actualResourceAttributes))

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ resourceMetrics:
stringValue: abcd
- key: cloud.platform
value:
stringValue: abcd
stringValue: azure.aks
- key: cloud.provider
value:
stringValue: abcd
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ resourceMetrics:
stringValue: abcd
- key: cloud.platform
value:
stringValue: abcd
stringValue: azure.aks
- key: cloud.provider
value:
stringValue: abcd
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ resourceMetrics:
stringValue: abcd
- key: cloud.platform
value:
stringValue: abcd
stringValue: aws_eks
- key: cloud.provider
value:
stringValue: abcd
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ resourceMetrics:
stringValue: abcd
- key: cloud.platform
value:
stringValue: abcd
stringValue: aws_eks
- key: cloud.provider
value:
stringValue: abcd
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ resourceMetrics:
stringValue: abcd
- key: cloud.platform
value:
stringValue: abcd
stringValue: aws_eks
- key: cloud.provider
value:
stringValue: abcd
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ resourceMetrics:
stringValue: abcd
- key: cloud.platform
value:
stringValue: abcd
stringValue: aws_eks
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I excluded this attribute from the normalization done in test; this is a good attr value to ascertain that the order of detectors are correct

- key: cloud.provider
value:
stringValue: abcd
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ resourceMetrics:
stringValue: abcd
- key: cloud.platform
value:
stringValue: abcd
stringValue: gcp_compute_engine
- key: cloud.provider
value:
stringValue: abcd
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ resourceMetrics:
stringValue: abcd
- key: cloud.platform
value:
stringValue: abcd
stringValue: gcp_compute_engine
- key: cloud.provider
value:
stringValue: abcd
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ resourceMetrics:
stringValue: abcd
- key: cloud.platform
value:
stringValue: abcd
stringValue: gcp_kubernetes_engine
- key: cloud.provider
value:
stringValue: abcd
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ resourceMetrics:
stringValue: abcd
- key: cloud.platform
value:
stringValue: abcd
stringValue: gcp_kubernetes_engine
- key: cloud.provider
value:
stringValue: abcd
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,21 @@
resourceMetrics:
- resource:
attributes:
- key: cloud.account.id
value:
stringValue: abcd
- key: cloud.availability_zone
value:
stringValue: abcd
- key: cloud.platform
value:
stringValue: aws_openshift
- key: cloud.provider
value:
stringValue: abcd
- key: cloud.region
value:
stringValue: abcd
- key: cluster_name
value:
stringValue: abcd
Expand All @@ -10,12 +25,21 @@ resourceMetrics:
- key: customfield2
value:
stringValue: abcd
- key: host.id
value:
stringValue: abcd
- key: host.image.id
value:
stringValue: abcd
- key: host.name
value:
stringValue: abcd
- key: host.type
value:
stringValue: abcd
- key: k8s.cluster.name
value:
stringValue: ci-k8s-cluster
stringValue: rosa-test-htcpw
- key: k8s.namespace.name
value:
stringValue: abcd
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
resourceMetrics:
- resource:
attributes:
- key: cloud.platform
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ec2 detector resources are missing in the test cluster because openshift has restriction on IMDS access originating from non-host network. User can run the clusterReceiver in hostNetwork or they can give IAM permission to the pod.

value:
stringValue: aws_openshift
- key: cloud.provider
value:
stringValue: abcd
- key: cloud.region
value:
stringValue: abcd
- key: cluster_name
value:
stringValue: abcd
Expand All @@ -15,7 +24,7 @@ resourceMetrics:
stringValue: abcd
- key: k8s.cluster.name
value:
stringValue: ci-k8s-cluster
stringValue: rosa-test-htcpw
- key: k8s.namespace.name
value:
stringValue: abcd
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,6 @@ extraAttributes:
- name: "customfield2"
value: "customvalue2"

clusterName: ci-k8s-cluster
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed to test the cluster name is retrieved correctly

environment: dev
cloudProvider: aws
distribution: openshift
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
clusterName: ci-k8s-cluster
cloudProvider: aws
distribution: openshift
clusterName: rosa-test-htcpw
environment: dev
splunkObservability:
realm: us0
Expand Down
10 changes: 5 additions & 5 deletions helm-charts/splunk-otel-collector/templates/_helpers.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -341,7 +341,7 @@ Build the securityContext for Linux and Windows
Whether the clusterName configuration option is optional
*/}}
{{- define "splunk-otel-collector.clusterNameOptional" -}}
{{- or (hasPrefix "gke" .Values.distribution) (eq (include "splunk-otel-collector.isNonFargateEKS" .) "true") }}
{{- or (hasPrefix "gke" .Values.distribution) (eq (include "splunk-otel-collector.isNonFargateEKS" .) "true") (eq .Values.distribution "openshift") }}
{{- end -}}

{{/*
Expand Down Expand Up @@ -396,12 +396,12 @@ Returns true if the distribution is eks but not eks/fargate.
{{- end -}}

{{/*
Identifies K8s clutser running on AWS but they are not EKS.
Returns true if the cloud provider is aws and distribution is not set.
example: Vanilla K8s on AWS EC2
Identifies K8s cluster running on AWS but not EKS.
Returns true if the cloud provider is aws and the distribution is not EKS-based.
Examples: Vanilla K8s on AWS EC2, OpenShift on AWS (ROSA)
*/}}
{{- define "splunk-otel-collector.isNonEKSonAWS" -}}
{{- and (eq .Values.cloudProvider "aws") (eq .Values.distribution "") -}}
{{- and (eq .Values.cloudProvider "aws") (not (hasPrefix "eks" .Values.distribution)) -}}
{{- end -}}

{{/*
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,15 @@ rules:
- get
- list
- watch
- apiGroups:
- config.openshift.io
resources:
- infrastructures
- infrastructures/status
verbs:
- get
- list
- watch
{{- end }}
- apiGroups:
- ""
Expand Down
31 changes: 26 additions & 5 deletions helm-charts/splunk-otel-collector/templates/config/_common.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -62,16 +62,24 @@ resourcedetection:
# Note: Kubernetes distro detectors need to come first so they set the proper cloud.platform
# before it gets set later by the cloud provider detector.
- env
{{- if or (hasPrefix "gke" .Values.distribution) (eq .Values.cloudProvider "gcp") }}
- gcp
{{- else if or (eq (include "splunk-otel-collector.isNonEKSonAWS" .) "true") (eq (include "splunk-otel-collector.isNonFargateEKS" .) "true") }}
- eks
{{- if eq .Values.distribution "openshift" }}
- openshift
{{- else if eq .Values.distribution "aks" }}
- aks
{{- else if or (hasPrefix "gke" .Values.distribution) (eq .Values.cloudProvider "gcp") }}
- gcp
{{- else if eq (include "splunk-otel-collector.isNonFargateEKS" .) "true" }}
- eks
{{- end }}
{{- if eq .Values.cloudProvider "azure" }}
- azure
{{- end }}
{{- if and (eq .Values.distribution "openshift") (eq .Values.cloudProvider "gcp") }}
- gcp
{{- end }}
{{- if eq (include "splunk-otel-collector.isNonEKSonAWS" .) "true" }}
- ec2
{{- end }}
# The `system` detector goes last so it can't preclude cloud detectors from setting host/os info.
# https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor#ordering
- system
Expand All @@ -80,7 +88,7 @@ resourcedetection:
resource_attributes:
k8s.cluster.name:
enabled: true
{{- else if or (eq (include "splunk-otel-collector.isNonEKSonAWS" .) "true") (eq (include "splunk-otel-collector.isNonFargateEKS" .) "true") }}
{{- else if eq (include "splunk-otel-collector.isNonFargateEKS" .) "true" }}
eks:
node_from_env_var: K8S_NODE_NAME
resource_attributes:
Expand Down Expand Up @@ -117,6 +125,8 @@ resourcedetection/k8s_cluster_name:
- gcp
{{- else if eq (include "splunk-otel-collector.isNonFargateEKS" .) "true" }}
- eks
{{- else if eq .Values.distribution "openshift" }}
- openshift
{{- end }}
{{- if hasPrefix "gke" .Values.distribution }}
gcp:
Expand Down Expand Up @@ -163,6 +173,17 @@ resourcedetection/k8s_cluster_name:
enabled: false
cloud.platform:
enabled: false
{{- else if eq .Values.distribution "openshift" }}
openshift:
resource_attributes:
k8s.cluster.name:
enabled: true
cloud.provider:
enabled: false
cloud.platform:
enabled: false
cloud.region:
enabled: false
{{- end }}
override: true
timeout: 15s
Expand Down
Loading
Loading