Skip to content

Commit f356fdc

Browse files
committed
chore: Add docs for core distribution
1 parent d65b26c commit f356fdc

File tree

5 files changed

+160
-5
lines changed

5 files changed

+160
-5
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ Generated assets are available in the corresponding Github release page and as d
88

99
Current list of distributions:
1010

11+
- [nrdot-collector](./distributions/nrdot-collector/): comprehensive core distribution with full OTLP gateway capabilities, host monitoring, and Prometheus scraping.
1112
- [nrdot-collector-host](./distributions/nrdot-collector-host/): distribution focused on monitoring host metrics and logs
1213
- [nrdot-collector-k8s](./distributions/nrdot-collector-k8s/): distribution focused on monitoring a Kubernetes cluster
1314

distributions/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# Collector Distributions
22

33
This README covers topics that apply to all distributions. For distribution-specific information please refer to:
4+
- [nrdot-collector](./nrdot-collector/README.md)
45
- [nrdot-collector-host](./nrdot-collector-host/README.md)
56
- [nrdot-collector-k8s](./nrdot-collector-k8s/README.md)
67

distributions/core-components.md

Lines changed: 43 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,53 @@
11
# Core Components
22
This document describes the core components of the NRDOT distribution which should be included in all distributions.
33

4+
## Receivers
45
| Component | Reason |
56
|------------------------------------------|----------------------------------------------------------------------------------------------|
67
| `otlpreceiver` | Basic OTLP-based gateway capabilities |
8+
| `filelogreceiver` | Local file log collection for host monitoring |
9+
| `hostmetricsreceiver` | System metrics collection (CPU, memory, disk, network) |
10+
| `prometheusreceiver` | Prometheus metrics scraping for application monitoring |
11+
12+
## Processors
13+
| Component | Reason |
14+
|------------------------------------------|----------------------------------------------------------------------------------------------|
715
| `batchprocessor` | Performance optimization |
816
| `memorylimiterprocessor` | Reliability - Control over resource usage |
9-
| `routingconnector` | Reduce config redundancy for complex pipelines, e.g. multiple NR accounts based on attributes |
10-
| `otlpexporter` | Required to write to NR OTLP endpoint via HTTP |
11-
| `otlphttpexporter` | Required to write to NR OTLP endpoint via gRPC |
17+
| `attributesprocessor` | Attribute manipulation and enrichment |
18+
| `cumulativetodeltaprocessor` | Convert cumulative metrics to delta for proper aggregation |
19+
| `filterprocessor` | Filter out unwanted telemetry data |
20+
| `groupbyattrsprocessor` | Group and aggregate telemetry data by attributes |
21+
| `metricstransformprocessor` | Transform metric names and attributes for compatibility |
22+
| `resourcedetectionprocessor` | Automatic detection of resource attributes (cloud, host, etc.) |
23+
| `resourceprocessor` | Resource attribute manipulation and standardization |
24+
| `spanprocessor` | Span attribute manipulation and sampling decisions |
25+
| `tailsamplingprocessor` | Intelligent sampling based on trace content and patterns |
26+
| `transformprocessor` | Advanced telemetry data transformation using OTTL |
27+
28+
## Exporters
29+
| Component | Reason |
30+
|------------------------------------------|----------------------------------------------------------------------------------------------|
1231
| `debugexporter` | Debugging, testing, config validation |
32+
| `otlpexporter` | Required to write to NR OTLP endpoint via gRPC |
33+
| `otlphttpexporter` | Required to write to NR OTLP endpoint via HTTP |
34+
| `loadbalancingexporter` | Load balancing and failover across multiple backends |
35+
36+
## Connectors
37+
| Component | Reason |
38+
|------------------------------------------|----------------------------------------------------------------------------------------------|
39+
| `routingconnector` | Reduce config redundancy for complex pipelines, e.g. multiple NR accounts based on attributes |
40+
41+
## Extensions
42+
| Component | Reason |
43+
|------------------------------------------|----------------------------------------------------------------------------------------------|
1344
| `healthcheckextension` | Reliability - Basic health check capabilities |
14-
| `[env\|file\|http\|https\|yaml]provider` | Configuration from various sources |
45+
46+
## Providers
47+
| Component | Reason |
48+
|------------------------------------------|----------------------------------------------------------------------------------------------|
49+
| `envprovider` | Configuration from environment variables |
50+
| `fileprovider` | Configuration from local files |
51+
| `httpprovider` | Configuration from HTTP endpoints |
52+
| `httpsprovider` | Configuration from HTTPS endpoints |
53+
| `yamlprovider` | Configuration from YAML sources |

distributions/nrdot-collector/README.md

Lines changed: 47 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,50 @@
33
| Status | |
44
|-----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
55
| Distro | `nrdot-collector` |
6-
| Stability | `alpha`
6+
| Stability | `alpha` |
7+
8+
A distribution of the NRDOT collector focused on
9+
- monitoring the host the collector is deployed on via `hostmetricsreceiver` and `filelogreceiver`
10+
- enriching other OTLP data with host metadata via the `otlpreceiver` and `resourcedetectionprocessor`
11+
- facilitating gateway mode deployments with additional components for centralized telemetry collection and processing
12+
13+
This distribution includes all the capabilities of `nrdot-collector-host` plus additional components to support gateway mode deployments, allowing it to act as a central collection point for telemetry data from multiple sources.
14+
15+
Note: See [general README](../README.md) for information that applies to all distributions.
16+
17+
## Installation
18+
19+
The following instructions assume you have read and understood the [general installation instructions](../README.md#installation).
20+
21+
### Containerized Environments
22+
If you're deploying the `nrdot-collector` distribution as a container, make sure to configure the [root_path](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/hostmetricsreceiver/README.md#collecting-host-metrics-from-inside-a-container-linux-only) and mount the host's file system accordingly, otherwise NRDOT will not be able to collect host metrics properly.
23+
See also [our troubleshooting guide](./TROUBLESHOOTING.md) for more details.
24+
25+
### Gateway Mode Deployment
26+
When deploying in gateway mode, the collector acts as a central aggregation point for telemetry data. This mode is particularly useful for:
27+
- Reducing the number of direct connections to backend services
28+
- Centralizing telemetry processing and transformation
29+
- Implementing sampling and filtering policies
30+
- Buffering and batching telemetry data
31+
32+
## Configuration
33+
34+
Note: See [general README](../README.md) for information that applies to all distributions.
35+
36+
### Distribution-specific configuration
37+
38+
| Environment Variable | Description | Default |
39+
|---|---|---|
40+
| `OTEL_RESOURCE_ATTRIBUTES` | Key-value pairs to be used as resource attributes, see [OTel Docs](https://opentelemetry.io/docs/languages/sdk-configuration/general/#otel_resource_attributes) | N/A |
41+
42+
#### Enable process metrics
43+
Process metrics are disabled by default as they are quite noisy. If you want to enable them, you can do so by reconfiguring the `hostmetricsreceiver`, see also [receiver docs](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver#getting-started). Note that there is a [processesscraper (`system.processes.*` metrics)](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/hostmetricsreceiver/internal/scraper/processesscraper/documentation.md) and a [processscraper (`process.*` metrics)](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/hostmetricsreceiver/internal/scraper/processscraper/documentation.md) with separate options. An example configuration would look like this:
44+
```shell
45+
newrelic/nrdot-collector --config /etc/nrdot-collector/config.yaml \
46+
--config='yaml:receivers::hostmetrics::scrapers::processes: ' \
47+
--config='yaml:receivers::hostmetrics::scrapers::process: { metrics: { process.cpu.utilization: { enabled: true }, process.cpu.time: { enabled: false } } }'
48+
```
49+
50+
## Troubleshooting
51+
52+
Please refer to our [troubleshooting guide](./TROUBLESHOOTING.md).
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Troubleshooting for nrdot-collector
2+
3+
For general NRDOT troubleshooting, see [this guide](../TROUBLESHOOTING.md). This document assumes you are familiar with
4+
the troubleshooting tools mentioned.
5+
6+
## Known issues
7+
8+
### Missing host entity in New Relic UI due to missing `host.id`
9+
If you are [seeing telemetry getting ingested into New Relic](../TROUBLESHOOTING.md#user-content-stablelink-telemetry-not-reaching-new-relic) but even after a few minutes of waiting the Host UI does not show any host entities, you might be running into the limitations of the [resourcedetectionprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/README.md) NRDOT uses to determine the `host.id` attribute required to [synthesize a host entity](https://github.com/newrelic/entity-definitions/blob/main/entity-types/infra-host/definition.yml#L62-L63). An example log message indicating this issue looks like this:
10+
```
11+
# example 1
12+
2025-01-01T22:49:09.110Z warn system/system.go:143 failed to get host ID {"otelcol.component.id": "resourcedetection", "otelcol.component.kind" : "Processor", "otelcol.pipeline.id": "logs/host", "otelcol.signal": "logs", "error": "failed to obtain \"host.id\": error detecting resource: host id not found in: /etc/machine-id or /var/lib/dbus/machine-id"}
13+
# example 2
14+
2025-01-01T23:07:27.866Z warn system/system.go:143 failed to get host ID {"otelcol.component.id": "resourcedetection", "otelcol.component.kind": "Processor", "otelcol.pipeline.id": "metrics/host", "otelcol.signal": "metrics", "error": "empty \"host.id\""}
15+
```
16+
In order to resolve this, you can set the `host.id` attributes manually via the [environment variable](./README.md#configuration) `OTEL_RESOURCE_ATTRIBUTES`, e.g. `export OTEL_RESOURCE_ATTRIBUTES='host.id=my-custom-host-id'`.
17+
18+
### No `root_path` in containerized environments
19+
The `hostmetricsreceiver` auto-detects the files to scrape system metrics from. When running in a container, this causes issues as the receiver would then scrape metrics of the container instead of the host system which you most likely want to monitor. In order to bridge this gap, the receiver provides the `root_path` option which allows you to specify the path where the host file system is available to the collector, most commonly by mounting it into the container. The warning indicating this issue looks like this:
20+
```
21+
2025-01-01T21:08:21.097Z warn filesystemscraper/factory.go:48 No `root_path` config set when running in docker environment, will report container filesystem stats. See https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver#collecting-host-metrics-from-inside-a-container-linux-only {"otelcol.component.id": "hostmetrics", "otelcol.component.kind": "Receiver", "otelcol.signal": "metrics"}
22+
```
23+
In order to resolve this, make sure to follow the [receiver's docs](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/hostmetricsreceiver/README.md#collecting-host-metrics-from-inside-a-container-linux-only) to mount the host file system into the container at the `root_path` and configure the `root_path` accordingly, e.g.
24+
```bash
25+
docker run -v /:/hostfs \
26+
-e NEW_RELIC_LICENSE_KEY='license-key' newrelic/nrdot-collector \
27+
--config /etc/nrdot-collector/config.yaml \
28+
--config 'yaml:receivers::hostmetrics::root_path: /hostfs'
29+
```
30+
31+
## Gateway mode specific issues
32+
33+
### High memory usage in gateway deployments
34+
When running in gateway mode with high throughput, the collector may experience elevated memory usage. This is typically due to buffering and batching of telemetry data from multiple sources. To mitigate this:
35+
36+
1. **Adjust batch processor settings**: Configure the batch processor with appropriate timeout and batch size limits:
37+
```bash
38+
newrelic/nrdot-collector \
39+
--config /etc/nrdot-collector/config.yaml \
40+
--config 'yaml:processors::batch::timeout: 5s' \
41+
--config 'yaml:processors::batch::send_batch_size: 1000'
42+
```
43+
44+
2. **Configure memory limiter**: Use the memory limiter processor to prevent out-of-memory conditions:
45+
```bash
46+
newrelic/nrdot-collector \
47+
--config /etc/nrdot-collector/config.yaml \
48+
--config 'yaml:processors::memory_limiter::limit_mib: 512' \
49+
--config 'yaml:processors::memory_limiter::spike_limit_mib: 128'
50+
```
51+
52+
### Connection issues from remote collectors
53+
When using the collector in gateway mode, remote collectors may have trouble connecting. Common causes include:
54+
55+
1. **Firewall or network policies**: Ensure the OTLP receiver ports (default: 4317 for gRPC, 4318 for HTTP) are accessible from remote collectors.
56+
57+
2. **TLS configuration**: If using TLS, ensure certificates are properly configured and trusted by remote collectors.
58+
59+
3. **Incorrect endpoint configuration**: Verify remote collectors are configured with the correct endpoint URL and protocol (gRPC vs HTTP).
60+
61+
### Load balancing considerations
62+
For high-availability gateway deployments with multiple collector instances:
63+
64+
1. **Use consistent hashing**: When load balancing, use consistent hashing based on resource attributes to ensure related telemetry data is routed to the same collector instance.
65+
66+
2. **Configure health checks**: Set up appropriate health check endpoints for load balancers to detect unhealthy collector instances.
67+
68+
3. **Monitor queue sizes**: Keep an eye on internal queue sizes to detect backpressure issues early.

0 commit comments

Comments
 (0)