-
Notifications
You must be signed in to change notification settings - Fork 3.5k
[extension/opampextension] Enrich AgentDescription with OS, Kubernetes, deployment mode, and user-configurable identifying attributes #47279
Description
Component(s)
extension/opamp
Is your feature request related to a problem? Please describe.
When managing a fleet of OpenTelemetry Collectors via OpAMP, the AgentDescription reported by the upstream opampextension lacks several attributes that are critical for effective fleet management:
-
No OS version: The extension reports
os.description(a free-form string like"Ubuntu 22.04.3 LTS") but not a machine-parseableos.version(e.g."22.04"). This makes it difficult to programmatically filter, sort, or alert on OS versions across the fleet — for example, identifying all agents running on a kernel version affected by a CVE. -
No Kubernetes context: When collectors run as DaemonSets or Deployments in Kubernetes, the OpAMP server receives no information about the pod name, namespace, node, cluster, or owning workload. Operators are forced to cross-reference agent IDs against Kubernetes metadata in a separate system, which defeats the purpose of having a centralized fleet management protocol.
-
No deployment mode: There is no way to distinguish whether a collector is running as an
"agent"(per-node) or a"gateway"(centralized). This makes it impossible to apply mode-specific configuration policies or to get accurate counts of agents vs. gateways in a mixed fleet. -
No executable path: In environments with multiple collector binaries (e.g. side-by-side upgrades, canary deployments), there is no attribute to identify which binary is actually running, making rollout verification harder than it needs to be.
-
No user-configurable identifying attributes: The identifying attributes are fixed to
service.instance.id,service.name, andservice.version. Operators who need to include custom identifiers — such as a tenant ID, fleet group, or deployment region — at the identifying level have no mechanism to do so. This forces workarounds like overloadingservice.namewith encoded metadata, which breaks semantic conventions and downstream tooling.
Collectively, these gaps mean that the OpAMP server receives a minimal, inflexible description of each agent, requiring operators to build and maintain external enrichment pipelines to get the context they need for day-to-day fleet operations.
Describe the solution you'd like
Enrich the AgentDescription reported by opampextension with the following additions:
New non-identifying attributes
| Attribute | Source | Notes |
|---|---|---|
os.version |
host.Info().PlatformVersion |
Machine-parseable OS version (e.g. "15.3", "22.04") |
k8s.node.name |
K8S_NODE_NAME env var |
Auto-detected when KUBERNETES_SERVICE_HOST is present |
k8s.pod.name |
K8S_POD_NAME env var |
" |
k8s.namespace.name |
K8S_NAMESPACE env var |
" |
k8s.pod.uid |
K8S_POD_UID env var |
" |
k8s.cluster.name |
OTEL_K8S_CLUSTER_NAME env var |
" |
k8s.daemonset.name |
K8S_DAEMONSET_NAME env var |
" |
k8s.deployment.name |
K8S_DEPLOYMENT_NAME env var |
" |
otelcol.deployment.mode |
OTEL_COLLECTOR_MODE env var → config → default "agent" |
Values: "agent" or "gateway" |
process.executable.path |
os.Executable() |
Silently omitted if detection fails |
New config fields
extensions:
opamp:
agent_description:
# Explicit deployment mode; overridden by OTEL_COLLECTOR_MODE env var.
# Valid values: "agent", "gateway". Default: "agent".
deployment_mode: "gateway"
# Custom key-value pairs merged into identifying attributes.
# Override auto-detected values; excluded from non-identifying attrs.
identifying_attributes:
fleet.id: "us-west-2-prod"
custom.tenant: "acme-corp"
### Describe alternatives you've considered
_No response_
### Additional context
### Implementation reference
This feature set has been implemented, tested, and is running in production within the [Splunk OpAMP fork](https://github.com/signalfx/splunk-otel-collector). The changes in this PR port and refine that work for upstream contribution, ensuring alignment with OTel semantic conventions and upstream code style.
### Example: agent description before and after
**Before (upstream today):**
```json
{
"identifying_attributes": {
"service.instance.id": "f47ac10b-58cc",
"service.name": "otelcol-contrib",
"service.version": "0.102.0"
},
"non_identifying_attributes": {
"os.type": "linux",
"os.description": "Ubuntu 22.04.3 LTS",
"host.arch": "amd64",
"host.name": "worker-node-3"
}
}After (with this feature):
{
"identifying_attributes": {
"service.instance.id": "f47ac10b-58cc",
"service.name": "otelcol-contrib",
"service.version": "0.102.0",
"fleet.id": "us-west-2-prod"
},
"non_identifying_attributes": {
"os.type": "linux",
"os.description": "Ubuntu 22.04.3 LTS",
"os.version": "22.04",
"host.arch": "amd64",
"host.name": "worker-node-3",
"process.executable.path": "/usr/bin/otelcol-contrib",
"otelcol.deployment.mode": "agent",
"k8s.node.name": "worker-node-3",
"k8s.pod.name": "otelcol-agent-7b9f4",
"k8s.namespace.name": "monitoring",
"k8s.pod.uid": "a1b2c3d4-e5f6-7890",
"k8s.cluster.name": "prod-us-west-2",
"k8s.daemonset.name": "otelcol-agent"
}
}Example config
extensions:
opamp:
server:
ws:
endpoint: wss://opamp-server.example.com/v1/opamp
agent_description:
deployment_mode: "gateway"
identifying_attributes:
fleet.id: "us-west-2-prod"
custom.tenant: "acme-corp"Kubernetes Downward API setup
For K8s attribute auto-detection to work, operators need to expose the standard Downward API env vars in their pod spec. Example:
env:
- name: K8S_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: K8S_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: K8S_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: K8S_POD_UID
valueFrom:
fieldRef:
fieldPath: metadata.uid
- name: OTEL_K8S_CLUSTER_NAME
value: "prod-us-west-2"
- name: K8S_DAEMONSET_NAME
value: "otelcol-agent"
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.