Skip to content

fix(kafkareceiver): enforce a backoff mechanism on exporterhelper.ErrQueueIsFull error #39581

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

an-mmx
Copy link
Contributor

@an-mmx an-mmx commented Apr 23, 2025

Description

In the current implementation, the BackOff mechanism is trigerred only by memory limiter error. However, the same behavior is expected for the exporterhelper.ErrQueueIsFull error, which occurs when there is a sending queue overflow (both memory queue and persistent queue).

Link to tracking issue

#39580

Testing

Unit coverage added

Documentation

No documentation updated

Copy link
Contributor

@MovieStoreGuy MovieStoreGuy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also resolve the conflicts.

@@ -851,7 +852,8 @@ func newExponentialBackOff(config configretry.BackOffConfig) *backoff.Exponentia
}

func errorRequiresBackoff(err error) bool {
return err.Error() == errMemoryLimiterDataRefused.Error()
return err.Error() == errMemoryLimiterDataRefused.Error() ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this used either errors.Is or errors.As instead of comparing the strings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I don't see how this can be utilized without changes in opentelemetry-collector, since the memory limiter error is in an internal package: https://github.com/open-telemetry/opentelemetry-collector/blob/main/internal/memorylimiter/memorylimiter.go#L28

@atoulme atoulme marked this pull request as draft April 24, 2025 05:47
@atoulme
Copy link
Contributor

atoulme commented Apr 24, 2025

Moving to draft while this is being worked on

Copy link
Contributor

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@an-mmx did you consider updating the logic to retry on any error that is not considered permanent, like the exporterhelper retry sender? https://github.com/open-telemetry/opentelemetry-collector/blob/d020c9074f873c54e8ae7b8eaa4a08e13157cb76/exporter/exporterhelper/internal/retry_sender.go#L96-L99

I think that would be the ideal solution

@an-mmx
Copy link
Contributor Author

an-mmx commented Apr 24, 2025

@an-mmx did you consider updating the logic to retry on any error that is not considered permanent, like the exporterhelper retry sender? https://github.com/open-telemetry/opentelemetry-collector/blob/d020c9074f873c54e8ae7b8eaa4a08e13157cb76/exporter/exporterhelper/internal/retry_sender.go#L96-L99

I think that would be the ideal solution

I hadn’t considered this approach before.
It sounds reasonable, and I don’t see any negative side effects so far. If there are no objections from the codeowners, I’m okay with modifying this change in the proposed way.

@axw
Copy link
Contributor

axw commented Apr 28, 2025

@MovieStoreGuy @pavolloffay any concerns with the suggestion above to retry on any non-permanent errors? I expect we'll need to go through and classify some errors as permanent, but I don't think that needs to block?

@an-mmx an-mmx force-pushed the fix/kafkareceiver-backoff-on-queue-batch branch from b18be1c to e5dcad8 Compare May 6, 2025 11:05
@an-mmx
Copy link
Contributor Author

an-mmx commented May 6, 2025

Changed to enforce a backoff retry on any non permanent error

@an-mmx an-mmx marked this pull request as ready for review May 6, 2025 11:09
@an-mmx an-mmx force-pushed the fix/kafkareceiver-backoff-on-queue-batch branch 4 times, most recently from b8a43f0 to 8b699d3 Compare May 12, 2025 13:59
@an-mmx an-mmx force-pushed the fix/kafkareceiver-backoff-on-queue-batch branch from 8b699d3 to 36bf22d Compare May 15, 2025 12:00
Copy link
Contributor

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! Just needs an update on the changelog

@an-mmx an-mmx force-pushed the fix/kafkareceiver-backoff-on-queue-batch branch from 774127e to f5bb920 Compare May 19, 2025 14:36
@atoulme
Copy link
Contributor

atoulme commented May 29, 2025

Would you please rebase your branch and update the collector module versions? See the failing check, you can apply the git diff.

@an-mmx an-mmx force-pushed the fix/kafkareceiver-backoff-on-queue-batch branch from f5bb920 to 933dec4 Compare May 29, 2025 17:51
@MovieStoreGuy MovieStoreGuy merged commit 9b68993 into open-telemetry:main May 30, 2025
177 checks passed
@github-actions github-actions bot added this to the next release milestone May 30, 2025
dd-jasminesun pushed a commit to DataDog/opentelemetry-collector-contrib that referenced this pull request Jun 23, 2025
…QueueIsFull error (open-telemetry#39581)

#### Description

In the current implementation, the BackOff mechanism is trigerred only
by [memory limiter
error](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/kafkareceiver/kafka_receiver.go#L562-L564).
However, the same behavior is expected for the
`exporterhelper.ErrQueueIsFull` error, which occurs when there is a
sending queue overflow (both [memory
queue](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/internal/queuebatch/memory_queue.go#L109-L112)
and [persistent
queue](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/internal/queuebatch/persistent_queue.go#L244-L247)).

#### Link to tracking issue
open-telemetry#39580

#### Testing
Unit coverage added

#### Documentation
No documentation updated

---------

Co-authored-by: Andrew Wilkins <[email protected]>
Co-authored-by: Sean Marciniak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants