Skip to content

Inject component-identifying scope attributes #12617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 53 commits into from
Mar 28, 2025

Conversation

jade-guiton-dd
Copy link
Contributor

@jade-guiton-dd jade-guiton-dd commented Mar 12, 2025

Description

Fork of #12384 to showcase how component attributes can be injected into scope attributes instead of log/metric/span attributes. See that PR for more context.

To see the diff from the previous PR, filter changes starting from the "Prototype using scope attributes" commit.

Link to tracking issue

Resolves #12217
Also incidentally resolves #12213 and resolves #12117

Testing

I updated the existing tests to check for scope attributes, and did some manual testing with a debug exporter to check that the scope attributes are added/removed properly.

@mx-psi
Copy link
Member

mx-psi commented Mar 27, 2025

We discussed offline adding a feature gate for this and all other internal telemetry related changes, I intend to merge this once the comments are addressed and the feature gate has been added

@mx-psi mx-psi added this pull request to the merge queue Mar 28, 2025
Merged via the queue into open-telemetry:main with commit 54c13a9 Mar 28, 2025
52 of 56 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Apr 28, 2025
…behind feature gate (#12933)

#### Context

PR #12617 introduced logic to inject new instrumentation scope
attributes in all internal telemetry to identify which Collector
component it came from. These attributes had already been added to
internal logs as regular log attributes, and this PR switched them to
scope attributes for consistency. The new logic was placed behind an
Alpha stage feature gate, `telemetry.newPipelineTelemetry`.

Unfortunately, the default "off" state of the feature gate disabled the
injection of component-identifying attributes entirely, which was a
regression since they had been present in internal logs in previous
releases. See issue #12870 for an in-depth discussion of this issue.

To correct this, PR #12856 was filed, which stabilized the feature gate,
making it on by default, with no way to disable it, and removed the
logic that the feature gate used to toggle. This was thought to be the
simplest way to mitigate the regression in the "off" state, since we
planned to stabilize the feature eventually anyways.

Unfortunately, it was found that the "on" state of the feature gate
causes a different issue: [the Prometheus
exporter](https://github.com/open-telemetry/opentelemetry-go/tree/main/exporters/prometheus)
is the default way of exporting the Collector's internal metrics,
accessible at `collector:8888/metrics`. This exporter does not currently
have any support for instrumentation scope attributes, meaning that
metric streams differentiated by said attributes but not by any other
identifying property will appear as aliases to Prometheus, which causes
an error. This completely breaks the export of Collector metrics through
Prometheus under some simple configurations, which is a release blocker.

#### Description

To fix this issue, this PR sets the `telemetry.newPipelineTelemetry`
feature gate back to "Alpha" (off by default), and reintroduces logic to
disable the injection of the new instrumentation scope attributes when
the gate is off, but only in internal metrics. Note that the new logic
is still used unconditionally for logs and traces, to avoid
reintroducing the logs issue (#12870).

This should avoid breaking the Collector in its default configuration
while we try to get a fix in the Prometheus exporter.

#### Link to tracking issue
No tracking issue currently, will probably file one later.

#### Testing

I performed some simple manual testing with a config file like the
following:

```yaml
receivers:
  otlp: [...]
processors:
  batch:
exporters:
  debug: [...]
service:
  pipelines:
    logs:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [debug]
  telemetry:
    metrics:
      level: detailed    
    traces: [...]
    logs: [...]
```

The two batch processors create aliased metric streams, which are only
differentiated by the new component attributes. I checked that:
1. this config causes an error in the Prometheus exporter on main;
2. the error is resolved by default after applying this PR;
3. the error reappears when enabling the feature gate (this is expected)
4. scope attributes are added on the traces and logs no matter the state
of the gate.
github-merge-queue bot pushed a commit that referenced this pull request May 12, 2025
…peline components (#12812)

Depends on
#12856

Resolves #12676

This is a reboot of #11311, incorporating metrics defined in the
[component telemetry
RFC](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/rfcs/component-universal-telemetry.md)
and attributes added in #12617.

The basic pattern is:
- When building any pipeline component which produces data, wrap the
"next consumer" with instrumentation to measure the number of items
being passed. This wrapped consumer is then passed into the constructor
of the component.
- When building any pipeline component which consumes data, wrap the
component itself. This wrapped consumer is saved onto the graph node so
that it can be retrieved during graph assembly.

---------

Co-authored-by: Pablo Baeyens <[email protected]>
@@ -54,7 +54,7 @@ type otlpReceiver struct {
// responsibility to invoke the respective Start*Reception methods as well
// as the various Stop*Reception methods to end it.
func newOtlpReceiver(cfg *Config, set *receiver.Settings) (*otlpReceiver, error) {
set.Logger = telemetry.LoggerWithout(set.TelemetrySettings, componentattribute.SignalKey)
set.TelemetrySettings = telemetry.WithoutAttributes(set.TelemetrySettings, componentattribute.SignalKey)
Copy link

@vigneshshanmugam vigneshshanmugam Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jade-guiton-dd Apologies for asking on a old PR, Was there any specific reason for not including the signal info on the exposed metrics from OTLP receiver? This would be really useful to see the amount of data ingested based on the signal type.

We did already have this info exposed in the older metrics via otelcol_receiver_accepted_log_records, otelcol_receiver_accepted_metric_points and otelcol_receiver_accepted_spans. Trying to understand if we have a plan to add them down the line? Thanks.

Copy link
Contributor Author

@jade-guiton-dd jade-guiton-dd Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OTLP receiver is internally a single object, even when configured in multiple pipelines for multiple signals. For that reason, the telemetry it emits can't easily be associated with a single signal, so it removes the "otelcol.signal" attribute from its set of attributes on startup. If we didn't do that, all telemetry from the component would be associated with whichever signal pipeline happened to be created first, which would not be helpful.

However, the OTLP receiver could manually add back a signal attribute on specific metric points which are associated with a specific signal. But I don't believe this is currently needed:

  • The older otelcol_receiver_X metrics (which aren't going anywhere for the foreseeable future) already differentiate between signals in their name
  • The new metrics emitted by pipeline auto-instrumentation (implemented in a later PR) use the original attribute set of the component before startup, which includes "otelcol.signal".

Do you have any examples of internal metrics emitted by the OTLP receiver which are lacking association with a specific signal (and which could be associated with one despite the singleton architecture)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed answer, I was going off based on this PR and upon testing the different metrics/logs, I do see the otelcol.signal being present in all of the emitted telemetry - even the custom ones generated via mdatagen which is super cool. Thanks for making that happen 👍🏽

One thing I noticed is since we are currently treating Middlewares as part of the Extension interface, some of the pipeline attributes like signal and outcome are missing from the reported metrics. Should we treat them similar to Receivers as they are running as part of the Receiver end?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not extremely familiar with the middleware interface considering how new it is (are there even any implementations of it yet?), but I think this would be of questionable use and difficult to accomplish:

  • Because middlewares act at the HTTP/gRPC request level, there's no generic reliable way to know which signal (if any) they're processing. This is determined later by the receiver, after the request has been processed by the middleware. The only case where I think this would be doable is if the receiver only handles a single type of signal, in which case otelcol.signal is much less useful anyway.
  • We only use the outcome attribute on auto-instrumented pipeline metrics, not arbitrary receiver telemetry, because the auto-instrumentation layer is at the right place to know whether the next component succeeded or not. I think adding a similar instrumentation layer inside middlewares would be difficult, but I'm not familiar enough with the middleware API to tell for sure.

However, I think we could make a stronger case for adding an attribute to middleware telemetry to know which receiver instance it's used in. This would be doable, though we would need to redesign the middleware API to pass in a new TelemetrySettings with the appropriate attributes for each call to GetHTTPHandler / GetGRPCServerOptions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the details again. Just to give some context, I am working with the Middleware based extension to add additional attributes to the custom telemetry records without relying on some of the proposed alternatives like Baggage propagation, Processors etc - #12316

This is determined later by the receiver, after the request has been processed by the middleware.

Makes sense 👍🏽.

However, I think we could make a stronger case for adding an attribute to middleware telemetry to know which receiver instance it's used in

That would be useful since middleware extension can basically be run on any HTTP/gRPC receivers.

Do you think its worth creating a issue for this so we can move the discussion?

Copy link
Contributor Author

@jade-guiton-dd jade-guiton-dd Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would make sense. It would be good to make the contributors working on the middleware interface aware that this need exists.

github-merge-queue bot pushed a commit that referenced this pull request Jun 11, 2025
#### Context

PR #12617, which implemented the injection of component-identifying
attributes into the `zap.Logger` provided to components, introduced
significant additional memory use when the Collector's pipelines contain
many components (#13014). This was because we would call
`zapcore.NewSamplerWithOptions` to wrap the specialized logger core of
each Collector component, which allocates half a megabyte's worth of
sampling counters.

This problem was mitigated in #13015 by moving the sampling layer to a
different location in the logger core hierarchy. This meant that
Collector users that do not export their logs through OTLP and only use
stdout-based logs no longer saw the memory increase.

#### Description

This PR aims to provide a better solution to this issue, by using the
`reflect` library to clone zap's sampler core and set a new inner core,
while reusing the counter allocation.

(This may also be "more correct" from a sampling point of view, ie. we
only have one global instance of the counters instead of one for console
logs and one for each component's OTLP-exported logs, but I'm not sure
if anyone noticed the difference anyway).

#### Link to tracking issue
Fixes #13014

#### Testing
A new test was added which checks that the log counters are shared
between two sampler cores with different attributes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants