Skip to content

resourcedetection/docker - Error response from daemon: No such container #46275

@jahead

Description

@jahead

Component(s)

processor/resourcedetection

What happened?

Description

We currently provision a otel collector as a container on an embedded device to log hostmetrics of the host, and collect telemetry from local services/applications, and we use resourcedetection to append the host.name/os to the telemetry without extra configuration.
Today we were seeing launch failures for our latest integration tests for updating our components.

"error": "failed getting container info: failed to fetch container information: Error response from daemon: No such container: <hostname>

We think the issue comes from this PR: #44898 that was released in 145, specifically https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/44898/changes#diff-371aa58476e039eed8820a6869713c6778bc1074551a0435d6628c604bb3ae37R58-R62

The issue is, that we use network_mode: host so that the receivers/hostmetrics to have access to the host network information. but has the side effect of setting the container's hostname to be the same as the hosts.

This leads the code from #44898 to incorrectly look up the wrong container, as the container's name doesn't match the hostname. Note: this pr would cause problems if someone tries to configure the hostname to be different to the container name.

Given these properties are optional it would be good to have these lookups optional as well, or be able turn off this behavior.
Alternatively let us explicitly set the container name in the lookup for the container inspect invocation instead assuming its the hostname. - from my reading of the code, that may be possible? its not well docummented

Currently we will have to pin to v144, until we have a possible fix for our workflow. I've attached our config for helpfulness

docker-compose.yaml

...
  otel-collector:
    image: ${OTEL_IMAGE:-otel/opentelemetry-collector-contrib}
    restart: unless-stopped
    user: "0"  # Run as root to access host resources
...
    logging:
      driver: journald
      options:
        tag: "xxx-otel-collector"
    network_mode: host  # Required for host metrics collection
    volumes:
      # Host filesystem for metrics collection
      - ${OTEL_HOST_ROOT:-/}:/hostfs:ro
      - ${OTEL_HOST_DOCKER_SOCK:-/var/run/docker.sock}:/var/run/docker.sock:ro
      # OTEL configuration
      - ${OTEL_HOST_CONFIG:-./otel-collector/config.yaml}:/etc/otelcol-contrib/config.yaml:ro

Expected Result

have otel collector run inside a container and collect host metrics for the host, and use resourcedetection to detect the hostname of the host.

Actual Result

Collector version

0.146.1

Environment information

Environment

OS: (e.g., "custom yocto")

OpenTelemetry Collector configuration

receivers:
  hostmetrics:
    collection_interval: 30s # Adjust as needed
    root_path: /hostfs      # Path where the host filesystem is mounted
    scrapers:
      cpu:
      memory:
      disk:
      filesystem:
      network:
      process:
        mute_process_all_errors: true
      processes:
      paging:
      load:
      system:
  # journaldctl is not available in the docker image by default
  # The host filesystem is mounted to /hostfs so we just need to point to the correct path
  # journald:
  #   path: /hostfs/var/log/journal

  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  # Collect own metrics
  prometheus:
    config:
      scrape_configs:
      - job_name: 'otel-collector'
        scrape_interval: 10s
        static_configs:
        - targets: ['0.0.0.0:8888']

processors:
  batch:

  resourcedetection/docker:
    # Modify the list of detectors to match the  environment
    detectors: [env, docker]
    timeout: 2s
    override: false

exporters:
# Change this to your desired endpoint
  otlp/traces:
    endpoint: "xx:443"
  otlp/logs:
    endpoint: "xx:443"
  otlp/metrics:
    endpoint: "xxx:443"
service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [resourcedetection/docker, batch]
      exporters: [otlp/traces]
    metrics:
      receivers: [otlp, prometheus, hostmetrics]
      processors: [resourcedetection/docker, batch]
      exporters: [otlp/metrics]
    logs:
      receivers: [otlp]
      processors: [resourcedetection/docker, batch]
      exporters: [otlp/logs]

Log output

Error: cannot start pipelines: failed to start "resourcedetection/docker" processor: failed getting container info: failed to fetch container information: Error response from daemon: No such container: xxx-00-01-c0-ac-0d-00

Additional context

No response

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions