Skip to content

Conversation

@Sanchit2662
Copy link

Summary

This PR adds support for partial success reporting in receivers by allowing them to explicitly report how many items in a batch failed, instead of treating any error as a full batch failure.

A new consumererror.PartialError is introduced to carry failed item counts, and receiverhelper is updated to correctly compute accepted, failed, and refused metrics when partial failures occur.
This fixes the current all-or-nothing behavior in receiver self-observability and enables accurate metrics for receivers such as Prometheus, where translation or parsing failures may affect only part of a batch.

Closes: #14440


Impact

  • Incorrect observability metrics for receivers that partially succeed

  • Makes it difficult to distinguish:

    • Internal receiver failures
    • Downstream pipeline rejections
  • Forces receivers to invent custom metrics instead of using standard collector self-observability


Fix

1. New API: consumererror.PartialError

A new error type is introduced to represent partial failures by carrying a failed item count:

// Create a partial error indicating some items failed
err := consumererror.NewPartialError(originalErr, 20)

It composes cleanly with existing errors, including downstream rejections:

err := consumererror.NewDownstream(
    consumererror.NewPartialError(originalErr, 30),
)

The failed count can be extracted safely from the error chain:

if pe := consumererror.GetPartialError(err); pe != nil {
    failedCount := pe.FailedCount()
}

2. Updated receiverhelper Behavior

receiverhelper.endOp() is updated to detect PartialError and correctly compute metrics:

Before (all-or-nothing):

if err != nil {
    numAccepted = 0
    numRefused = numReceivedItems
}

After (partial success supported):

if err != nil {
    failedCount := numReceivedItems
    if pe := consumererror.GetPartialError(err); pe != nil {
        failedCount = pe.FailedCount()
    }
    numAccepted = numReceivedItems - failedCount
    // failedCount is categorized as failed or refused
}

This preserves existing behavior for receivers that do not use PartialError.


Result

  • receiver_accepted_* now reflects the number of successfully processed items
  • receiver_failed_* reports internal receiver failures accurately
  • receiver_refused_* correctly tracks downstream rejections
  • No new metrics or feature gates required
  • Fully backward compatible

This enables receivers (notably the Prometheus receiver) to report accurate self-observability metrics for partial success scenarios using standard Collector mechanisms.

Signed-off-by: Sanchit2662 <sanchit2662@gmail.com>
@Sanchit2662 Sanchit2662 requested a review from a team as a code owner January 18, 2026 22:01
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 18, 2026

CLA Signed
The committers listed above are authorized under a signed CLA.

  • ✅ login: Sanchit2662 / name: SANCHIT KUMAR (abca3e9)

@Sanchit2662
Copy link
Author

Hi @jade-guiton-dd ,
This PR fixes receiver self-observability by allowing receivers to report partial success, so accepted, failed, and refused metrics are counted accurately instead of treating any error as a full batch failure.
Would appreciate a review when you have time. Thanks!

@dashpole
Copy link
Contributor

Hey @Sanchit2662 Thanks for working on this. I think we still need to get some feedback on the design before moving forward with the implementation. Can you leave your New API: consumererror.PartialError section as a comment on the issue so we make sure it is considered?

Copy link
Contributor

@jade-guiton-dd jade-guiton-dd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking until a single design has been agreed upon by all relevant parties

@mx-psi
Copy link
Member

mx-psi commented Jan 19, 2026

Hi @Sanchit2662, thanks for your PR. I would be interested in knowing whether you have used a generative AI tool such as ChatGPT to create this PR. Could you clarify if you have done so, and, if so, would you mind providing details as to what was your involvement/review of the tool output?

@Sanchit2662
Copy link
Author

Hi @dashpole @jade-guiton-dd ,
I’ve added the consumererror.PartialError design proposal as a comment on the issue for broader feedback. Happy to iterate based on the discussion there.

@Sanchit2662
Copy link
Author

Hi @mx-psi, thanks for asking.

Yes, I did use a generative AI tool as a supporting aid to help think through the implementation approach and explore possible design options.

The actual code, API shape, and integration with the existing OpenTelemetry Collector components were implemented and reviewed by me.All suggestions were manually evaluated, adapted to the project’s conventions, and validated through local testing and review.

I’m happy to clarify or walk through any part of the implementation if that would be helpful.

Copy link
Member

@mx-psi mx-psi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the answer. I encourage you to read the Generative AI contribution policy which is available here: https://github.com/open-telemetry/community/blob/main/policies/genai.md before moving forward with this PR.

As @jade-guiton-dd said, this needs discussion before moving forward to get to an agreement, since there are multiple PRs trying to solve the same issue in different ways.

If you want to participate in such discussion, check the Generative AI contribution policy carefully and please do not copy/paste LLM output directly: review it, summarize it, discard anything that is unnecessary or superfluous or correct anything that is wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Self-observability: Allow receivers to report partial success

4 participants