Skip to content

Commit b5703ca

Browse files
authored
Merge pull request #23144 from newrelic/collector-observability-docs
Collector observability docs
2 parents f17bca9 + 1a61fb7 commit b5703ca

6 files changed

Lines changed: 222 additions & 0 deletions

File tree

Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
---
2+
title: OpenTelemetry Collector monitoring
3+
metaDescription: Monitor your OpenTelemetry collector's health and performance using its internal telemetry in a dedicated APM UI experience.
4+
freshnessValidatedDate: 2025-05-27
5+
tags:
6+
- Integrations
7+
- Open source telemetry integrations
8+
- OpenTelemetry
9+
---
10+
11+
<Callout title="Preview">
12+
We're still working on this feature, but we'd love for you to try it out!
13+
14+
This feature is currently provided as part of a preview pursuant to our [pre-release policies](/docs/licenses/license-information/referenced-policies/new-relic-pre-release-policy/).
15+
</Callout>
16+
17+
Monitor your [OpenTelemetry collector](https://opentelemetry.io/docs/collector/) health and performance with a dedicated APM UI experience. When your collector fails or misbehaves, it can cause a blackout of observability data, permanent data loss, or distorted insights. That's why we built Collector Observability—an APM experience tailored to the streaming work collectors perform. You can leverage the collector's [internal telemetry](https://opentelemetry.io/docs/collector/internal-telemetry/) to see at a glance how each component of your collector is performing, so you can spot issues before they impact your observability pipeline.
18+
19+
## Set up collector monitoring [#setup]
20+
21+
<Callout variant="important" title="Billing">
22+
Your use of Collector Observability is billable during preview in accordance with your Order as applicable to the pricing model associated with your Account and as defined below.
23+
24+
The costs associated with this feature are determined by the following factors, as applicable to the pricing model associated with your Account:
25+
26+
**Core Compute**: The Summary page, measured in Core CCU is billable during preview. The Process page and HTTP/RPC page are not billable during preview.
27+
28+
**Data Ingest**: Additional data from the internal telemetry, measured in GB Ingested is billable during preview.
29+
30+
If this feature becomes generally available, your use will be billable in accordance with your Order.
31+
</Callout>
32+
33+
### Enable internal telemetry for your collector [#enable-telemetry]
34+
35+
By default, the collector doesn't emit its [internal telemetry](https://opentelemetry.io/docs/collector/internal-telemetry/), so you'll need to enable it first.
36+
37+
#### Download the configuration file
38+
39+
```bash
40+
curl -L https://raw.githubusercontent.com/newrelic/nrdot-collector-releases/refs/tags/1.10.0/examples/internal-telemetry-config.yaml \
41+
--silent --output internal-telemetry-config.yaml
42+
```
43+
44+
#### Set environment variables
45+
46+
* <DNT>`INTERNAL_TELEMETRY_NEW_RELIC_LICENSE_KEY`</DNT>: Ingest license key for the account internal telemetry should be sent to. This key can be different than the key the collector uses to send regular data to New Relic, i.e. <DNT>`NEW_RELIC_LICENSE_KEY`</DNT> in the example below.
47+
* <DNT>`INTERNAL_TELEMETRY_SERVICE_NAME`</DNT>: Defaults to `otel-collector`, determines entity name in New Relic
48+
* <DNT>`INTERNAL_TELEMETRY_OTLP_ENDPOINT`</DNT>: Defaults to US `https://otlp.nr-data.net`; if you are in EU, set this to `https://otlp.eu01.nr-data.net`
49+
50+
#### Run collector with merged configuration
51+
52+
In addition to the normal configuration for components and pipelines (in the following example `--config=/etc/nrdot-collector/config.yaml`), add a second `--config` argument which will merge both configurations:
53+
54+
```bash
55+
docker run \
56+
-e INTERNAL_TELEMETRY_NEW_RELIC_LICENSE_KEY='...' \
57+
-e NEW_RELIC_LICENSE_KEY='...' \
58+
-e INTERNAL_TELEMETRY_SERVICE_NAME='demo-collector' \
59+
-v './internal-telemetry-config.yaml:/etc/nrdot-collector/config-internal.yaml' \
60+
newrelic/nrdot-collector:1.10.0 --config=/etc/nrdot-collector/config.yaml \
61+
--config='/etc/nrdot-collector/config-internal.yaml'
62+
```
63+
64+
<Callout variant="important">
65+
The order of the `--config` arguments matters if you have preexisting configuration under the `service::telemetry` node.
66+
The collector uses a 'last one wins' strategy on a node level when merging configurations and certain parts of the config (e.g. lists, leaf nodes) cannot be merged, so they get overwritten by the last `--config` argument.
67+
</Callout>
68+
69+
#### Alternative (not recommended for production)
70+
71+
We don't recommend this for reliability reasons, but for testing purposes you can also reference the configuration directly and the collector will pull it on startup:
72+
73+
```bash
74+
docker run \
75+
-e INTERNAL_TELEMETRY_NEW_RELIC_LICENSE_KEY='...' \
76+
-e NEW_RELIC_LICENSE_KEY='...' \
77+
-e INTERNAL_TELEMETRY_SERVICE_NAME='demo-collector' \
78+
newrelic/nrdot-collector:1.10.0 --config=/etc/nrdot-collector/config.yaml \
79+
--config='https://raw.githubusercontent.com/newrelic/nrdot-collector-releases/refs/tags/1.10.0/examples/internal-telemetry-config.yaml'
80+
```
81+
82+
### Add entity tag [#add-tag]
83+
84+
The tag <DNT>`newrelic.service.type: otel_collector`</DNT> acts as an opt-in to the experience at the UI level. Choose one of the following options:
85+
86+
* **Option 1**: Use the example configuration provided above which contains the configuration of Option 2.
87+
* **Option 2**: Add argument <DNT>`--config=yaml:service::telemetry::resource::newrelic.service.type: otel_collector`</DNT> to collector. This adds the attribute as a resource attribute and New Relic does the tagging for you on ingest. If you remove this option, it will take one day for the tag to expire.
88+
* **Option 3**: Add tag via APM UI (top of the page, next to entity name). You can remove this via the UI as well to toggle back.
89+
90+
### Customizing configuration [#customize]
91+
92+
The default configuration exposes additional environment variables of the form `INTERNAL_TELEMETRY_...` to tweak common options such as detail levels and sampling. Refer to the [configuration itself](https://github.com/newrelic/nrdot-collector-releases/blob/main/examples/internal-telemetry-config.yaml) for more details.
93+
94+
## View your collector in the UI [#view-ui]
95+
96+
### Explore the internal telemetry in the APM UI [#explore-ui]
97+
98+
To view your collector's internal telemetry, navigate to <DNT>**APM & Services > Services - OpenTelemetry > your_collector_name**</DNT> to explore the collector's entity.
99+
100+
Depending on the components your collector uses, some charts might not be populated. For example, if you don't receive or export OTLP data via gRPC, the RPC charts will be empty.
101+
102+
### Summary page [#summary-page]
103+
104+
The Summary page gives you an overview of your collector's health and performance:
105+
106+
* Overall collector health metrics
107+
* Charts for receivers, processors, exporters and batching (requires [batchprocessor](https://github.com/open-telemetry/opentelemetry-collector/tree/v0.128.0/processor/batchprocessor)) behavior
108+
* Dedicated chart for [memorylimiter](https://github.com/open-telemetry/opentelemetry-collector/tree/v0.128.0/processor/memorylimiterprocessor) due to its unique failure mode
109+
110+
<img
111+
title="Screenshot showing new Summary Page"
112+
alt="Screenshot showing new Summary Page"
113+
src="/images/otel_collector_o11y-summary.webp"
114+
/>
115+
116+
* Infrastructure relationships and metrics (if configured, see [configuration examples](#examples))
117+
118+
<img
119+
title="Screenshot showing related infrastructure telemetry in the new Summary Page"
120+
alt="Screenshot showing related infrastructure telemetry in the new Summary Page"
121+
src="/images/otel_collector_o11y-infra.webp"
122+
/>
123+
124+
### Process page [#process-page]
125+
126+
Track system-level resource consumption with the Process page:
127+
128+
* CPU utilization and trends
129+
* Memory usage and patterns
130+
* Process-level performance indicators
131+
132+
<img
133+
title="Screenshot showing new Process Page"
134+
alt="Screenshot showing new Process Page"
135+
src="/images/otel_collector_o11y-process.webp"
136+
/>
137+
138+
### HTTP/RPC page [#http-rpc-page]
139+
140+
Monitor network communication with the HTTP/RPC page:
141+
142+
* HTTP client and server request metrics
143+
* gRPC communication statistics
144+
* Network latency and error rates
145+
146+
<img
147+
title="Screenshot showing new HTTP/RPC Page"
148+
alt="Screenshot showing new HTTP/RPC Page"
149+
src="/images/otel_collector_o11y-httprpc.webp"
150+
/>
151+
152+
Note: These charts are only populated if your collector uses components that communicate via HTTP or gRPC (like OTLP receivers or exporters).
153+
154+
## Configuration examples [#examples]
155+
156+
For detailed configuration examples and customization options, refer to the [internal telemetry configuration](https://github.com/newrelic/nrdot-collector-releases/blob/main/examples/internal-telemetry-config.yaml) in the NRDOT Collector releases repository.
157+
158+
The configuration exposes environment variables of the form `INTERNAL_TELEMETRY_...` to customize options such as detail levels and sampling.
159+
160+
### Deployment examples [#deployment-examples]
161+
162+
For real-world deployment scenarios, see these example configurations:
163+
164+
* **[Kubernetes deployment](https://github.com/newrelic/helm-charts/tree/master/charts/nr-k8s-otel-collector)**: Example of running a collector in Kubernetes with internal telemetry enabled
165+
* **[Docker deployment](https://github.com/newrelic/nrdot-collector-releases/tree/main/examples)**: Example Docker configurations for various collector setups with monitoring enabled
166+
167+
## Billing and data overhead [#billing-overhead]
168+
169+
Collector Observability generates additional telemetry data that counts toward your data ingest. Understanding the data volume helps you plan for costs and optimize your configuration.
170+
171+
### Data generated [#data-generated]
172+
173+
Collector Observability generates the following types of telemetry:
174+
175+
* **Metrics**: Component-level insights into receivers, processors, and exporters
176+
* **Logs**: Sampled logs with minimal overhead during normal operation
177+
* **Traces**: Optional experimental support for deep pipeline analysis
178+
179+
### Expected data volume [#expected-volume]
180+
181+
The amount of data generated depends on your collector's configuration:
182+
183+
* **Baseline metrics**: A collector with standard components (receiver, processor, exporter) generates approximately **1-2 KB/minute** of metric data
184+
* **Logs**: With default sampling (1% of log statements), expect **0.5-1 KB/minute** during normal operation. During errors or high verbosity, this can increase significantly
185+
* **Traces** (optional): If enabled, expect **5-10 KB/minute** depending on pipeline complexity
186+
187+
For a typical collector processing telemetry 24/7:
188+
* **Monthly metric ingest**: ~2.5-5 MB
189+
* **Monthly log ingest**: ~1.5-3 MB (normal operation)
190+
* **Monthly trace ingest**: ~10-20 MB (if enabled)
191+
192+
### Controlling data overhead [#controlling-overhead]
193+
194+
You can adjust telemetry generation using environment variables:
195+
196+
* <DNT>**`INTERNAL_TELEMETRY_METRICS_LEVEL`**</DNT>: Set to `none` to disable metrics, `normal` (default), or `detailed`
197+
* <DNT>**`INTERNAL_TELEMETRY_LOGS_LEVEL`**</DNT>: Set to `info` (default), `warn`, or `error` to reduce log volume
198+
* <DNT>**`INTERNAL_TELEMETRY_LOGS_SAMPLING`**</DNT>: Adjust sampling rate (default: 1%)
199+
* <DNT>**`INTERNAL_TELEMETRY_TRACES_ENABLED`**</DNT>: Set to `false` to disable traces (disabled by default)
200+
201+
For configuration details, see the [customizing configuration](#customize) section above.
202+
203+
## Limitations [#limitations]
204+
205+
* **Certain common APM features** don't seamlessly translate to the collector and its stream-processing nature and have been hidden from the Summary page and side navigation. We'll take a look at each of them for General Availability to determine if and how we can best integrate them. Examples are:
206+
* Apdex score
207+
* Meaningful `apm.%` metrics
208+
* Error Rate
209+
* Transactions
210+
211+
* **Collector telemetry isn't stable yet**:
212+
* The latest supported version of telemetry emitted by collector components is the component version in our NRDOT distributions (see [manifest](https://github.com/newrelic/nrdot-collector-releases/blob/main/distributions/nrdot-collector/manifest.yaml) for reference).
213+
* If the emitted telemetry changes during Public Preview, we reserve the right to only support the most recent version. For example, we support <DNT>`http.client.request.duration`</DNT> (collector version >=0.128.0, NRDOT version >=1.2.0, nr-k8s-otel-collector >=0.8.37) but not <DNT>`http.request.duration`</DNT>.
214+
* With General Availability, we'll start providing backwards compatibility if metric names continue to change.
215+
216+
* **Export format requirements**: The collector UI expects telemetry in the format exported by the <DNT>`otlpexporter`</DNT>. It doesn't support metrics exported via Prometheus. For example, <DNT>`http.client.request.duration`</DNT> is supported, but <DNT>`http_client_request_duration`</DNT> isn't.
217+
218+
* **Custom component metrics**: Internal telemetry that isn't listed in the [internal telemetry documentation](https://opentelemetry.io/docs/collector/internal-telemetry/) isn't supported yet. Custom or [contrib components](https://github.com/open-telemetry/opentelemetry-collector-contrib) emit standard metrics but can also define their own metrics. We're still working on a way to help you get insights from those without writing custom dashboards.
219+
220+
* **Golden metrics for OTel containers**: Not fully supported yet, which means some columns in the infrastructure panel might not be populated for containers.

src/nav/opentelemetry.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ pages:
1919
path: /docs/opentelemetry/get-started/apm-monitoring/opentelemetry-apm-intro
2020
- title: OpenTelemetry APM UI
2121
path: /docs/opentelemetry/get-started/apm-monitoring/opentelemetry-apm-ui
22+
- title: Collector observability
23+
path: /docs/opentelemetry/collector-observability/collector-observability
2224
- title: Collector for infrastructure monitoring
2325
path: /docs/opentelemetry/get-started/collector-infra-monitoring/opentelemetry-collector-infra-intro
2426
- title: Collector for data processing
433 KB
Loading
213 KB
Loading
211 KB
Loading
311 KB
Loading

0 commit comments

Comments
 (0)