|
| 1 | +--- |
| 2 | +title: OpenTelemetry Collector monitoring |
| 3 | +metaDescription: Monitor your OpenTelemetry collector's health and performance using its internal telemetry in a dedicated APM UI experience. |
| 4 | +freshnessValidatedDate: 2025-05-27 |
| 5 | +tags: |
| 6 | + - Integrations |
| 7 | + - Open source telemetry integrations |
| 8 | + - OpenTelemetry |
| 9 | +--- |
| 10 | + |
| 11 | +<Callout title="Preview"> |
| 12 | +We're still working on this feature, but we'd love for you to try it out! |
| 13 | + |
| 14 | +This feature is currently provided as part of a preview pursuant to our [pre-release policies](/docs/licenses/license-information/referenced-policies/new-relic-pre-release-policy/). |
| 15 | +</Callout> |
| 16 | + |
| 17 | +Monitor your [OpenTelemetry collector](https://opentelemetry.io/docs/collector/) health and performance with a dedicated APM UI experience. When your collector fails or misbehaves, it can cause a blackout of observability data, permanent data loss, or distorted insights. That's why we built Collector Observability—an APM experience tailored to the streaming work collectors perform. You can leverage the collector's [internal telemetry](https://opentelemetry.io/docs/collector/internal-telemetry/) to see at a glance how each component of your collector is performing, so you can spot issues before they impact your observability pipeline. |
| 18 | + |
| 19 | +## Set up collector monitoring [#setup] |
| 20 | + |
| 21 | +<Callout variant="important" title="Billing"> |
| 22 | +Your use of Collector Observability is billable during preview in accordance with your Order as applicable to the pricing model associated with your Account and as defined below. |
| 23 | + |
| 24 | +The costs associated with this feature are determined by the following factors, as applicable to the pricing model associated with your Account: |
| 25 | + |
| 26 | +**Core Compute**: The Summary page, measured in Core CCU is billable during preview. The Process page and HTTP/RPC page are not billable during preview. |
| 27 | + |
| 28 | +**Data Ingest**: Additional data from the internal telemetry, measured in GB Ingested is billable during preview. |
| 29 | + |
| 30 | +If this feature becomes generally available, your use will be billable in accordance with your Order. |
| 31 | +</Callout> |
| 32 | + |
| 33 | +### Enable internal telemetry for your collector [#enable-telemetry] |
| 34 | + |
| 35 | +By default, the collector doesn't emit its [internal telemetry](https://opentelemetry.io/docs/collector/internal-telemetry/), so you'll need to enable it first. |
| 36 | + |
| 37 | +#### Download the configuration file |
| 38 | + |
| 39 | +```bash |
| 40 | +curl -L https://raw.githubusercontent.com/newrelic/nrdot-collector-releases/refs/tags/1.10.0/examples/internal-telemetry-config.yaml \ |
| 41 | + --silent --output internal-telemetry-config.yaml |
| 42 | +``` |
| 43 | + |
| 44 | +#### Set environment variables |
| 45 | + |
| 46 | +* <DNT>`INTERNAL_TELEMETRY_NEW_RELIC_LICENSE_KEY`</DNT>: Ingest license key for the account internal telemetry should be sent to. This key can be different than the key the collector uses to send regular data to New Relic, i.e. <DNT>`NEW_RELIC_LICENSE_KEY`</DNT> in the example below. |
| 47 | +* <DNT>`INTERNAL_TELEMETRY_SERVICE_NAME`</DNT>: Defaults to `otel-collector`, determines entity name in New Relic |
| 48 | +* <DNT>`INTERNAL_TELEMETRY_OTLP_ENDPOINT`</DNT>: Defaults to US `https://otlp.nr-data.net`; if you are in EU, set this to `https://otlp.eu01.nr-data.net` |
| 49 | + |
| 50 | +#### Run collector with merged configuration |
| 51 | + |
| 52 | +In addition to the normal configuration for components and pipelines (in the following example `--config=/etc/nrdot-collector/config.yaml`), add a second `--config` argument which will merge both configurations: |
| 53 | + |
| 54 | +```bash |
| 55 | +docker run \ |
| 56 | + -e INTERNAL_TELEMETRY_NEW_RELIC_LICENSE_KEY='...' \ |
| 57 | + -e NEW_RELIC_LICENSE_KEY='...' \ |
| 58 | + -e INTERNAL_TELEMETRY_SERVICE_NAME='demo-collector' \ |
| 59 | + -v './internal-telemetry-config.yaml:/etc/nrdot-collector/config-internal.yaml' \ |
| 60 | + newrelic/nrdot-collector:1.10.0 --config=/etc/nrdot-collector/config.yaml \ |
| 61 | + --config='/etc/nrdot-collector/config-internal.yaml' |
| 62 | +``` |
| 63 | + |
| 64 | +<Callout variant="important"> |
| 65 | +The order of the `--config` arguments matters if you have preexisting configuration under the `service::telemetry` node. |
| 66 | +The collector uses a 'last one wins' strategy on a node level when merging configurations and certain parts of the config (e.g. lists, leaf nodes) cannot be merged, so they get overwritten by the last `--config` argument. |
| 67 | +</Callout> |
| 68 | + |
| 69 | +#### Alternative (not recommended for production) |
| 70 | + |
| 71 | +We don't recommend this for reliability reasons, but for testing purposes you can also reference the configuration directly and the collector will pull it on startup: |
| 72 | + |
| 73 | +```bash |
| 74 | +docker run \ |
| 75 | + -e INTERNAL_TELEMETRY_NEW_RELIC_LICENSE_KEY='...' \ |
| 76 | + -e NEW_RELIC_LICENSE_KEY='...' \ |
| 77 | + -e INTERNAL_TELEMETRY_SERVICE_NAME='demo-collector' \ |
| 78 | + newrelic/nrdot-collector:1.10.0 --config=/etc/nrdot-collector/config.yaml \ |
| 79 | + --config='https://raw.githubusercontent.com/newrelic/nrdot-collector-releases/refs/tags/1.10.0/examples/internal-telemetry-config.yaml' |
| 80 | +``` |
| 81 | + |
| 82 | +### Add entity tag [#add-tag] |
| 83 | + |
| 84 | +The tag <DNT>`newrelic.service.type: otel_collector`</DNT> acts as an opt-in to the experience at the UI level. Choose one of the following options: |
| 85 | + |
| 86 | +* **Option 1**: Use the example configuration provided above which contains the configuration of Option 2. |
| 87 | +* **Option 2**: Add argument <DNT>`--config=yaml:service::telemetry::resource::newrelic.service.type: otel_collector`</DNT> to collector. This adds the attribute as a resource attribute and New Relic does the tagging for you on ingest. If you remove this option, it will take one day for the tag to expire. |
| 88 | +* **Option 3**: Add tag via APM UI (top of the page, next to entity name). You can remove this via the UI as well to toggle back. |
| 89 | + |
| 90 | +### Customizing configuration [#customize] |
| 91 | + |
| 92 | +The default configuration exposes additional environment variables of the form `INTERNAL_TELEMETRY_...` to tweak common options such as detail levels and sampling. Refer to the [configuration itself](https://github.com/newrelic/nrdot-collector-releases/blob/main/examples/internal-telemetry-config.yaml) for more details. |
| 93 | + |
| 94 | +## View your collector in the UI [#view-ui] |
| 95 | + |
| 96 | +### Explore the internal telemetry in the APM UI [#explore-ui] |
| 97 | + |
| 98 | +To view your collector's internal telemetry, navigate to <DNT>**APM & Services > Services - OpenTelemetry > your_collector_name**</DNT> to explore the collector's entity. |
| 99 | + |
| 100 | +Depending on the components your collector uses, some charts might not be populated. For example, if you don't receive or export OTLP data via gRPC, the RPC charts will be empty. |
| 101 | + |
| 102 | +### Summary page [#summary-page] |
| 103 | + |
| 104 | +The Summary page gives you an overview of your collector's health and performance: |
| 105 | + |
| 106 | +* Overall collector health metrics |
| 107 | +* Charts for receivers, processors, exporters and batching (requires [batchprocessor](https://github.com/open-telemetry/opentelemetry-collector/tree/v0.128.0/processor/batchprocessor)) behavior |
| 108 | +* Dedicated chart for [memorylimiter](https://github.com/open-telemetry/opentelemetry-collector/tree/v0.128.0/processor/memorylimiterprocessor) due to its unique failure mode |
| 109 | + |
| 110 | +<img |
| 111 | + title="Screenshot showing new Summary Page" |
| 112 | + alt="Screenshot showing new Summary Page" |
| 113 | + src="/images/otel_collector_o11y-summary.webp" |
| 114 | +/> |
| 115 | + |
| 116 | +* Infrastructure relationships and metrics (if configured, see [configuration examples](#examples)) |
| 117 | + |
| 118 | +<img |
| 119 | + title="Screenshot showing related infrastructure telemetry in the new Summary Page" |
| 120 | + alt="Screenshot showing related infrastructure telemetry in the new Summary Page" |
| 121 | + src="/images/otel_collector_o11y-infra.webp" |
| 122 | +/> |
| 123 | + |
| 124 | +### Process page [#process-page] |
| 125 | + |
| 126 | +Track system-level resource consumption with the Process page: |
| 127 | + |
| 128 | +* CPU utilization and trends |
| 129 | +* Memory usage and patterns |
| 130 | +* Process-level performance indicators |
| 131 | + |
| 132 | +<img |
| 133 | + title="Screenshot showing new Process Page" |
| 134 | + alt="Screenshot showing new Process Page" |
| 135 | + src="/images/otel_collector_o11y-process.webp" |
| 136 | +/> |
| 137 | + |
| 138 | +### HTTP/RPC page [#http-rpc-page] |
| 139 | + |
| 140 | +Monitor network communication with the HTTP/RPC page: |
| 141 | + |
| 142 | +* HTTP client and server request metrics |
| 143 | +* gRPC communication statistics |
| 144 | +* Network latency and error rates |
| 145 | + |
| 146 | +<img |
| 147 | + title="Screenshot showing new HTTP/RPC Page" |
| 148 | + alt="Screenshot showing new HTTP/RPC Page" |
| 149 | + src="/images/otel_collector_o11y-httprpc.webp" |
| 150 | +/> |
| 151 | + |
| 152 | +Note: These charts are only populated if your collector uses components that communicate via HTTP or gRPC (like OTLP receivers or exporters). |
| 153 | + |
| 154 | +## Configuration examples [#examples] |
| 155 | + |
| 156 | +For detailed configuration examples and customization options, refer to the [internal telemetry configuration](https://github.com/newrelic/nrdot-collector-releases/blob/main/examples/internal-telemetry-config.yaml) in the NRDOT Collector releases repository. |
| 157 | + |
| 158 | +The configuration exposes environment variables of the form `INTERNAL_TELEMETRY_...` to customize options such as detail levels and sampling. |
| 159 | + |
| 160 | +### Deployment examples [#deployment-examples] |
| 161 | + |
| 162 | +For real-world deployment scenarios, see these example configurations: |
| 163 | + |
| 164 | +* **[Kubernetes deployment](https://github.com/newrelic/helm-charts/tree/master/charts/nr-k8s-otel-collector)**: Example of running a collector in Kubernetes with internal telemetry enabled |
| 165 | +* **[Docker deployment](https://github.com/newrelic/nrdot-collector-releases/tree/main/examples)**: Example Docker configurations for various collector setups with monitoring enabled |
| 166 | + |
| 167 | +## Billing and data overhead [#billing-overhead] |
| 168 | + |
| 169 | +Collector Observability generates additional telemetry data that counts toward your data ingest. Understanding the data volume helps you plan for costs and optimize your configuration. |
| 170 | + |
| 171 | +### Data generated [#data-generated] |
| 172 | + |
| 173 | +Collector Observability generates the following types of telemetry: |
| 174 | + |
| 175 | +* **Metrics**: Component-level insights into receivers, processors, and exporters |
| 176 | +* **Logs**: Sampled logs with minimal overhead during normal operation |
| 177 | +* **Traces**: Optional experimental support for deep pipeline analysis |
| 178 | + |
| 179 | +### Expected data volume [#expected-volume] |
| 180 | + |
| 181 | +The amount of data generated depends on your collector's configuration: |
| 182 | + |
| 183 | +* **Baseline metrics**: A collector with standard components (receiver, processor, exporter) generates approximately **1-2 KB/minute** of metric data |
| 184 | +* **Logs**: With default sampling (1% of log statements), expect **0.5-1 KB/minute** during normal operation. During errors or high verbosity, this can increase significantly |
| 185 | +* **Traces** (optional): If enabled, expect **5-10 KB/minute** depending on pipeline complexity |
| 186 | + |
| 187 | +For a typical collector processing telemetry 24/7: |
| 188 | +* **Monthly metric ingest**: ~2.5-5 MB |
| 189 | +* **Monthly log ingest**: ~1.5-3 MB (normal operation) |
| 190 | +* **Monthly trace ingest**: ~10-20 MB (if enabled) |
| 191 | + |
| 192 | +### Controlling data overhead [#controlling-overhead] |
| 193 | + |
| 194 | +You can adjust telemetry generation using environment variables: |
| 195 | + |
| 196 | +* <DNT>**`INTERNAL_TELEMETRY_METRICS_LEVEL`**</DNT>: Set to `none` to disable metrics, `normal` (default), or `detailed` |
| 197 | +* <DNT>**`INTERNAL_TELEMETRY_LOGS_LEVEL`**</DNT>: Set to `info` (default), `warn`, or `error` to reduce log volume |
| 198 | +* <DNT>**`INTERNAL_TELEMETRY_LOGS_SAMPLING`**</DNT>: Adjust sampling rate (default: 1%) |
| 199 | +* <DNT>**`INTERNAL_TELEMETRY_TRACES_ENABLED`**</DNT>: Set to `false` to disable traces (disabled by default) |
| 200 | + |
| 201 | +For configuration details, see the [customizing configuration](#customize) section above. |
| 202 | + |
| 203 | +## Limitations [#limitations] |
| 204 | + |
| 205 | +* **Certain common APM features** don't seamlessly translate to the collector and its stream-processing nature and have been hidden from the Summary page and side navigation. We'll take a look at each of them for General Availability to determine if and how we can best integrate them. Examples are: |
| 206 | + * Apdex score |
| 207 | + * Meaningful `apm.%` metrics |
| 208 | + * Error Rate |
| 209 | + * Transactions |
| 210 | + |
| 211 | +* **Collector telemetry isn't stable yet**: |
| 212 | + * The latest supported version of telemetry emitted by collector components is the component version in our NRDOT distributions (see [manifest](https://github.com/newrelic/nrdot-collector-releases/blob/main/distributions/nrdot-collector/manifest.yaml) for reference). |
| 213 | + * If the emitted telemetry changes during Public Preview, we reserve the right to only support the most recent version. For example, we support <DNT>`http.client.request.duration`</DNT> (collector version >=0.128.0, NRDOT version >=1.2.0, nr-k8s-otel-collector >=0.8.37) but not <DNT>`http.request.duration`</DNT>. |
| 214 | + * With General Availability, we'll start providing backwards compatibility if metric names continue to change. |
| 215 | + |
| 216 | +* **Export format requirements**: The collector UI expects telemetry in the format exported by the <DNT>`otlpexporter`</DNT>. It doesn't support metrics exported via Prometheus. For example, <DNT>`http.client.request.duration`</DNT> is supported, but <DNT>`http_client_request_duration`</DNT> isn't. |
| 217 | + |
| 218 | +* **Custom component metrics**: Internal telemetry that isn't listed in the [internal telemetry documentation](https://opentelemetry.io/docs/collector/internal-telemetry/) isn't supported yet. Custom or [contrib components](https://github.com/open-telemetry/opentelemetry-collector-contrib) emit standard metrics but can also define their own metrics. We're still working on a way to help you get insights from those without writing custom dashboards. |
| 219 | + |
| 220 | +* **Golden metrics for OTel containers**: Not fully supported yet, which means some columns in the infrastructure panel might not be populated for containers. |
0 commit comments