Skip to content

[retry_sender] Message on retryable error is causing concern #13943

@atoulme

Description

@atoulme

Component(s)

No response

What happened?

Describe the bug
When there is an error related to a failed export tied to a timeout, users see an info log such as:

retry_sender.go:126 Exporting failed. Will retry the request after interval. {"error": "rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "9.640533959s"}

This error is transient and the sending will be retried. However, it creates concern from users as they have a hard time interpreting what it means.

The error can be caused by a variety of factors:

  • The connection timed out
  • The backend we talk to is busy and timed out
  • There is a network bandwidth constraint
  • There is an intermediate actor such as a proxy, firewall, or load balancer that is somehow dropping the connection
  • We are sending a large payload

This issue is meant to discuss how to shore up more information about the source of the error and reduce user anxiety when they see "Exporting failed". We need to find a way to show how this error is benign but can constitute a pattern for dropped data eventually if the connection is bad.

Collector version

v0.137.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

Log output

Additional context

No response

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions