-
Notifications
You must be signed in to change notification settings - Fork 3.3k
[receiver/docker_stats] Fix context cancelled bug #45655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Move context and cancellation outside of parsing. It was cancelling the context concurrently with reading the resonse. If reading the response takes a little bit of time it would cancel instead of read the response. Fixes open-telemetry#34194
|
Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib. Important reminders:
A maintainer will review your pull request soon. Thank you for helping make OpenTelemetry better! |
MovieStoreGuy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for submitting the change,
I am not sure that this change will actually fix the issue you're seeing. If the Docker daemon is under significant pressure, it will be delayed in responding, and will block the scrape request meaning that data from the host will be late.
In this scenario, I believe the best case is to fail fast and fail the request so we can move to the next time window.
|
Current behaviour that I'm seeing is that I downloaded otelcol, enabled The host isn't that busy, around 50% CPU, spikes up a little bit, but it runs ~40 containers. So it's missing metrics for a lot of containers. I'm testing running it with my patch now and the errors went away. |
|
@jamesmoessis please take a look too as code-owner. |
Description
Sometimes collecting docker stats would fail with
"error": "context canceled"It's because it was cancelling the context concurrently with reading the response. If reading the response takes a little bit of time it would cancel instead of reading the response.
Move context and cancellation outside to the calling function so that it can do parsing before cancelling the timeout.
Link to tracking issue
Fixes #34194
#34320
open-telemetry/opentelemetry-demo#1677
Testing
Added a reproduction integration test that simulates a slightly slow response.