- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.8k
Add retry dropped item metrics and an exhausted retry error marker for exporter helper retries #13957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Codecov Report❌ Patch coverage is  
 Additional details and impacted files@@            Coverage Diff             @@
##             main   #13957      +/-   ##
==========================================
- Coverage   92.27%   92.26%   -0.02%     
==========================================
  Files         657      657              
  Lines       41111    41188      +77     
==========================================
+ Hits        37936    38001      +65     
- Misses       2173     2181       +8     
- Partials     1002     1006       +4     ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
 | 
869d3f8    to
    ddfd2b6      
    Compare
  
    | @open-telemetry/collector-approvers can you take a look? | 
…r exporter helper retries Signed-off-by: Israel Blancas <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking feedback (cc @jade-guiton-dd @axw).
Question 1: The universal telemetry RFC describes the use of an attribute otelcol.component.outcome=failure to indicate when an export fails. Why would we need a separate counter to indicate when retry fails?
Question 2: If the exporterhelper is configured with wait_for_result=true then it's difficult to call these failures "drops". Wouldn't the same sort of "drop" happen if the queue is configured (without wait_for_result=true) but also without the retry processor?
I guess these questions lead me to suspect that it's the queue (not the retry sender) that should count drops which are requests that fail and have no upstream response returned because wait_for_result=false. Otherwise, failures are failures, I see no reason to count them in a new way.
| Thanks for your always valuable feedback @jmacd :D 
 The RFC attribute only tells you whether a single export span ended in success or failure. It doesn’t say why it failed or how many items were lost. Before this change, the obsreport sender only knew that  By having the retry sender wrap the terminal error with  
 
 The queue already accounts for the situations it is responsible for ( In the configuration you mentioned (queue enabled,  So the queue doesn’t have enough context to produce a “retry exhausted” metric, while the retry sender does. That’s why the new counters live alongside the retry logic instead of inside the queue. | 
| (For the record, the type of failure that occurred is already visible in logs. Of course, that doesn't mean we can't also surface it as metrics.) | 
Description
Link to tracking issue
Fixes #13956