Skip to content

Conversation

@henrikno
Copy link

Description

Sometimes collecting docker stats would fail with "error": "context canceled"

It's because it was cancelling the context concurrently with reading the response. If reading the response takes a little bit of time it would cancel instead of reading the response.

Move context and cancellation outside to the calling function so that it can do parsing before cancelling the timeout.

Link to tracking issue

Fixes #34194
#34320
open-telemetry/opentelemetry-demo#1677

Testing

Added a reproduction integration test that simulates a slightly slow response.

Move context and cancellation outside of parsing.

It was cancelling the context concurrently with reading the resonse.
If reading the response takes a little bit of time it would cancel
instead of read the response.

Fixes open-telemetry#34194
@henrikno henrikno requested review from a team and MovieStoreGuy as code owners January 26, 2026 22:48
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 26, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@github-actions github-actions bot added the first-time contributor PRs made by new contributors label Jan 26, 2026
@github-actions
Copy link
Contributor

Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib.

Important reminders:

A maintainer will review your pull request soon. Thank you for helping make OpenTelemetry better!

Copy link
Contributor

@MovieStoreGuy MovieStoreGuy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for submitting the change,

I am not sure that this change will actually fix the issue you're seeing. If the Docker daemon is under significant pressure, it will be delayed in responding, and will block the scrape request meaning that data from the host will be late.

In this scenario, I believe the best case is to fail fast and fail the request so we can move to the next time window.

@henrikno
Copy link
Author

Current behaviour that I'm seeing is that I downloaded otelcol, enabled docker_stats and get this error very frequently:

2026-01-27T21:09:30.489Z	error	[email protected]/docker.go:196	Could not parse docker containerStats for container id	{"resource": {"service.instance.id": "bedf9234-92c5-4dac-8988-66660892bf7e", "service.name": "./elastic-agent", "service.version": "9.2.4"}, "otelcol.component.id": "docker_stats", "otelcol.component.kind": "receiver", "otelcol.signal": "metrics", "id": "e6af820f51811ab3cd81ca6f71935f0c7faac82d045b06ccddbe6f19c83be4e3", "error": "context canceled"}
github.com/open-telemetry/opentelemetry-collector-contrib/internal/docker.(*Client).toStatsJSON
	github.com/open-telemetry/opentelemetry-collector-contrib/internal/[email protected]/docker.go:196
github.com/open-telemetry/opentelemetry-collector-contrib/internal/docker.(*Client).FetchContainerStatsAsJSON
	github.com/open-telemetry/opentelemetry-collector-contrib/internal/[email protected]/docker.go:146
github.com/open-telemetry/opentelemetry-collector-contrib/receiver/dockerstatsreceiver.(*metricsReceiver).scrapeV2.func1
	github.com/open-telemetry/opentelemetry-collector-contrib/receiver/[email protected]/receiver.go:97

The host isn't that busy, around 50% CPU, spikes up a little bit, but it runs ~40 containers. So it's missing metrics for a lot of containers.
I tried curling the stats endpoint and the response times seems bimodal. Sometimes it returns in 12 milliseconds, sometimes 900ms. I was briefly looking at the docker code and it seem like there's actually a background process that collects the stats, and it's refreshing every 1 second, so I think the time taken depends on how long it needs to wait for that to be published. I've never seen it above 1s though.

I'm testing running it with my patch now and the errors went away.
Docker API being slower than the collection interval is a valid concern, but should be handled by the timeout (did a quick test for it, I can add it if we want)

@ChrsMark
Copy link
Member

@jamesmoessis please take a look too as code-owner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[docker_stats] - Could not parse docker containerStats for container id

3 participants