Component(s)
processor/resourcedetection
What happened?
Description
We currently provision a otel collector as a container on an embedded device to log hostmetrics of the host, and collect telemetry from local services/applications, and we use resourcedetection to append the host.name/os to the telemetry without extra configuration.
Today we were seeing launch failures for our latest integration tests for updating our components.
"error": "failed getting container info: failed to fetch container information: Error response from daemon: No such container: <hostname>
We think the issue comes from this PR: #44898 that was released in 145, specifically https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/44898/changes#diff-371aa58476e039eed8820a6869713c6778bc1074551a0435d6628c604bb3ae37R58-R62
The issue is, that we use network_mode: host so that the receivers/hostmetrics to have access to the host network information. but has the side effect of setting the container's hostname to be the same as the hosts.
This leads the code from #44898 to incorrectly look up the wrong container, as the container's name doesn't match the hostname. Note: this pr would cause problems if someone tries to configure the hostname to be different to the container name.
Given these properties are optional it would be good to have these lookups optional as well, or be able turn off this behavior.
Alternatively let us explicitly set the container name in the lookup for the container inspect invocation instead assuming its the hostname. - from my reading of the code, that may be possible? its not well docummented
Currently we will have to pin to v144, until we have a possible fix for our workflow. I've attached our config for helpfulness
docker-compose.yaml
...
otel-collector:
image: ${OTEL_IMAGE:-otel/opentelemetry-collector-contrib}
restart: unless-stopped
user: "0" # Run as root to access host resources
...
logging:
driver: journald
options:
tag: "xxx-otel-collector"
network_mode: host # Required for host metrics collection
volumes:
# Host filesystem for metrics collection
- ${OTEL_HOST_ROOT:-/}:/hostfs:ro
- ${OTEL_HOST_DOCKER_SOCK:-/var/run/docker.sock}:/var/run/docker.sock:ro
# OTEL configuration
- ${OTEL_HOST_CONFIG:-./otel-collector/config.yaml}:/etc/otelcol-contrib/config.yaml:ro
Expected Result
have otel collector run inside a container and collect host metrics for the host, and use resourcedetection to detect the hostname of the host.
Actual Result
Collector version
0.146.1
Environment information
Environment
OS: (e.g., "custom yocto")
OpenTelemetry Collector configuration
receivers:
hostmetrics:
collection_interval: 30s # Adjust as needed
root_path: /hostfs # Path where the host filesystem is mounted
scrapers:
cpu:
memory:
disk:
filesystem:
network:
process:
mute_process_all_errors: true
processes:
paging:
load:
system:
# journaldctl is not available in the docker image by default
# The host filesystem is mounted to /hostfs so we just need to point to the correct path
# journald:
# path: /hostfs/var/log/journal
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
# Collect own metrics
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 10s
static_configs:
- targets: ['0.0.0.0:8888']
processors:
batch:
resourcedetection/docker:
# Modify the list of detectors to match the environment
detectors: [env, docker]
timeout: 2s
override: false
exporters:
# Change this to your desired endpoint
otlp/traces:
endpoint: "xx:443"
otlp/logs:
endpoint: "xx:443"
otlp/metrics:
endpoint: "xxx:443"
service:
pipelines:
traces:
receivers: [otlp]
processors: [resourcedetection/docker, batch]
exporters: [otlp/traces]
metrics:
receivers: [otlp, prometheus, hostmetrics]
processors: [resourcedetection/docker, batch]
exporters: [otlp/metrics]
logs:
receivers: [otlp]
processors: [resourcedetection/docker, batch]
exporters: [otlp/logs]
Log output
Error: cannot start pipelines: failed to start "resourcedetection/docker" processor: failed getting container info: failed to fetch container information: Error response from daemon: No such container: xxx-00-01-c0-ac-0d-00
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
Component(s)
processor/resourcedetection
What happened?
Description
We currently provision a otel collector as a container on an embedded device to log hostmetrics of the host, and collect telemetry from local services/applications, and we use resourcedetection to append the host.name/os to the telemetry without extra configuration.
Today we were seeing launch failures for our latest integration tests for updating our components.
"error": "failed getting container info: failed to fetch container information: Error response from daemon: No such container: <hostname>We think the issue comes from this PR: #44898 that was released in 145, specifically https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/44898/changes#diff-371aa58476e039eed8820a6869713c6778bc1074551a0435d6628c604bb3ae37R58-R62
The issue is, that we use
network_mode: hostso that the receivers/hostmetrics to have access to the host network information. but has the side effect of setting the container's hostname to be the same as the hosts.This leads the code from #44898 to incorrectly look up the wrong container, as the container's name doesn't match the hostname. Note: this pr would cause problems if someone tries to configure the hostname to be different to the container name.
Given these properties are optional it would be good to have these lookups optional as well, or be able turn off this behavior.
Alternatively let us explicitly set the container name in the lookup for the container inspect invocation instead assuming its the hostname. - from my reading of the code, that may be possible? its not well docummented
Currently we will have to pin to v144, until we have a possible fix for our workflow. I've attached our config for helpfulness
docker-compose.yaml
Expected Result
have otel collector run inside a container and collect host metrics for the host, and use resourcedetection to detect the hostname of the host.
Actual Result
Collector version
0.146.1
Environment information
Environment
OS: (e.g., "custom yocto")
OpenTelemetry Collector configuration
Log output
Error: cannot start pipelines: failed to start "resourcedetection/docker" processor: failed getting container info: failed to fetch container information: Error response from daemon: No such container: xxx-00-01-c0-ac-0d-00Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding
+1orme too, to help us triage it. Learn more here.