[receiver/prometheus] add telemetry metrics for translation phase (metrics attempted and metrics dropped) #45233
+520
−64
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add
prometheus_receiver_translation_metric_points_attemptedandprometheus_receiver_translation_metric_points_droppedcounters to track OTLP data point translation success/failure rates.Description
Adds two internal telemetry metrics to observe the Prometheus-to-OTLP translation phase:
prometheus_receiver_translation_metric_points_attempted: Number of OTLP metric points the receiver attempted to createprometheus_receiver_translation_metric_points_dropped: Number of OTLP metric points that failed to be created (e.g., incomplete histograms missing_count)Counting semantics: These metrics count in OTLP terms, not Prometheus series. A histogram with labels
{method="GET"}counts as 1 attempted data point regardless of how many Prometheus series (buckets, sum, count) comprise it.As explained in #44196, this makes it easier to correlate with the existing
receiver_accepted_metric_pointsmetric from ObsReport and understand the actual output of the receiver.Also, note that even though the metric here uses the term
metric_points, the unit for it isdatapointsso that it's consistent with the collector's own metrics terminology, as in this example.Link to tracking issue
Resolves #44196
Testing
I tested this PR manually and added unit tests.
The automated tests cover:
_count) succeeding_count) being droppedFor my manual test, I did the following:
_count(2 metric points that should be dropped):8888curl -s http://localhost:8888/metrics | grep prometheus_receiver_translation
otelcol_prometheus_receiver_translation_metric_points_attempted= 7 (+5 for prometheus own metrics,up,scrape_duration_seconds,scrape_samples_scraped, scrape_series_added,scrape_samples_post_metric_relabeling)otelcol_prometheus_receiver_translation_metric_points_dropped= 2If you want to verify my manual testing yourselves, I created an example repository here.
Documentation
The docs I added for this PR were autogenerated by
mdatagenusing the description and spec included inmetadata.yaml.Question for maintainers: Should we add a brief section to the README mentioning these new internal telemetry metrics? I considered doing that, but I wasn't sure if this was important enough to be there.