chore: Add docs for core distribution

mailo-nr · mailo-nr · commit f356fdc409b9 · 2025-08-18T13:22:45.000-07:00
diff --git a/README.md b/README.md
@@ -8,6 +8,7 @@ Generated assets are available in the corresponding Github release page and as d
 
 Current list of distributions:
 
+- [nrdot-collector](./distributions/nrdot-collector/): comprehensive core distribution with full OTLP gateway capabilities, host monitoring, and Prometheus scraping.
 - [nrdot-collector-host](./distributions/nrdot-collector-host/): distribution focused on monitoring host metrics and logs
 - [nrdot-collector-k8s](./distributions/nrdot-collector-k8s/): distribution focused on monitoring a Kubernetes cluster
 
diff --git a/distributions/README.md b/distributions/README.md
@@ -1,6 +1,7 @@
 # Collector Distributions
 
 This README covers topics that apply to all distributions. For distribution-specific information please refer to:
+- [nrdot-collector](./nrdot-collector/README.md)
 - [nrdot-collector-host](./nrdot-collector-host/README.md)
 - [nrdot-collector-k8s](./nrdot-collector-k8s/README.md)
 
diff --git a/distributions/core-components.md b/distributions/core-components.md
@@ -1,14 +1,53 @@
 # Core Components
 This document describes the core components of the NRDOT distribution which should be included in all distributions.
 
+## Receivers
 | Component                                | Reason                                                                                       |
 |------------------------------------------|----------------------------------------------------------------------------------------------|
 | `otlpreceiver`                           | Basic OTLP-based gateway capabilities                                                        |
+| `filelogreceiver`                        | Local file log collection for host monitoring                                               |
+| `hostmetricsreceiver`                    | System metrics collection (CPU, memory, disk, network)                                      |
+| `prometheusreceiver`                     | Prometheus metrics scraping for application monitoring                                      |
+
+## Processors
+| Component                                | Reason                                                                                       |
+|------------------------------------------|----------------------------------------------------------------------------------------------|
 | `batchprocessor`                         | Performance optimization                                                                     |
 | `memorylimiterprocessor`                 | Reliability - Control over resource usage                                                    |
-| `routingconnector`                       | Reduce config redundancy for complex pipelines, e.g. multiple NR accounts based on attributes |
-| `otlpexporter`                           | Required to write to NR OTLP endpoint via HTTP                                               |
-| `otlphttpexporter`                       | Required to write to NR OTLP endpoint via gRPC                                               |
+| `attributesprocessor`                    | Attribute manipulation and enrichment                                                       |
+| `cumulativetodeltaprocessor`            | Convert cumulative metrics to delta for proper aggregation                                 |
+| `filterprocessor`                        | Filter out unwanted telemetry data                                                         |
+| `groupbyattrsprocessor`                  | Group and aggregate telemetry data by attributes                                            |
+| `metricstransformprocessor`              | Transform metric names and attributes for compatibility                                     |
+| `resourcedetectionprocessor`             | Automatic detection of resource attributes (cloud, host, etc.)                             |
+| `resourceprocessor`                      | Resource attribute manipulation and standardization                                         |
+| `spanprocessor`                          | Span attribute manipulation and sampling decisions                                          |
+| `tailsamplingprocessor`                  | Intelligent sampling based on trace content and patterns                                   |
+| `transformprocessor`                     | Advanced telemetry data transformation using OTTL                                          |
+
+## Exporters
+| Component                                | Reason                                                                                       |
+|------------------------------------------|----------------------------------------------------------------------------------------------|
 | `debugexporter`                          | Debugging, testing, config validation                                                        |
+| `otlpexporter`                           | Required to write to NR OTLP endpoint via gRPC                                              |
+| `otlphttpexporter`                       | Required to write to NR OTLP endpoint via HTTP                                              |
+| `loadbalancingexporter`                  | Load balancing and failover across multiple backends                                       |
+
+## Connectors
+| Component                                | Reason                                                                                       |
+|------------------------------------------|----------------------------------------------------------------------------------------------|
+| `routingconnector`                       | Reduce config redundancy for complex pipelines, e.g. multiple NR accounts based on attributes |
+
+## Extensions
+| Component                                | Reason                                                                                       |
+|------------------------------------------|----------------------------------------------------------------------------------------------|
 | `healthcheckextension`                   | Reliability - Basic health check capabilities                                                |
-| `[env\|file\|http\|https\|yaml]provider` | Configuration from various sources |
+
+## Providers
+| Component                                | Reason                                                                                       |
+|------------------------------------------|----------------------------------------------------------------------------------------------|
+| `envprovider`                            | Configuration from environment variables                                                     |
+| `fileprovider`                           | Configuration from local files                                                              |
+| `httpprovider`                           | Configuration from HTTP endpoints                                                           |
+| `httpsprovider`                          | Configuration from HTTPS endpoints                                                          |
+| `yamlprovider`                           | Configuration from YAML sources                                                             |
diff --git a/distributions/nrdot-collector/README.md b/distributions/nrdot-collector/README.md
@@ -3,4 +3,50 @@
 | Status    |                                                                                                                                                                                                             |
 |-----------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | Distro    | `nrdot-collector`                                                                                                                                                                                      |
-| Stability | `alpha`                                                                            
+| Stability | `alpha`                                                                                                                                                                                                    |
+
+A distribution of the NRDOT collector focused on
+- monitoring the host the collector is deployed on via `hostmetricsreceiver` and `filelogreceiver`
+- enriching other OTLP data with host metadata via the `otlpreceiver` and `resourcedetectionprocessor`
+- facilitating gateway mode deployments with additional components for centralized telemetry collection and processing
+
+This distribution includes all the capabilities of `nrdot-collector-host` plus additional components to support gateway mode deployments, allowing it to act as a central collection point for telemetry data from multiple sources.
+
+Note: See [general README](../README.md) for information that applies to all distributions.
+
+## Installation
+
+The following instructions assume you have read and understood the [general installation instructions](../README.md#installation).
+
+### Containerized Environments
+If you're deploying the `nrdot-collector` distribution as a container, make sure to configure the [root_path](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/hostmetricsreceiver/README.md#collecting-host-metrics-from-inside-a-container-linux-only) and mount the host's file system accordingly, otherwise NRDOT will not be able to collect host metrics properly.
+See also [our troubleshooting guide](./TROUBLESHOOTING.md) for more details.
+
+### Gateway Mode Deployment
+When deploying in gateway mode, the collector acts as a central aggregation point for telemetry data. This mode is particularly useful for:
+- Reducing the number of direct connections to backend services
+- Centralizing telemetry processing and transformation
+- Implementing sampling and filtering policies
+- Buffering and batching telemetry data
+
+## Configuration
+
+Note: See [general README](../README.md) for information that applies to all distributions.
+
+### Distribution-specific configuration
+
+| Environment Variable | Description | Default |
+|---|---|---|
+| `OTEL_RESOURCE_ATTRIBUTES` | Key-value pairs to be used as resource attributes, see [OTel Docs](https://opentelemetry.io/docs/languages/sdk-configuration/general/#otel_resource_attributes) | N/A |
+
+#### Enable process metrics
+Process metrics are disabled by default as they are quite noisy. If you want to enable them, you can do so by reconfiguring the `hostmetricsreceiver`, see also [receiver docs](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver#getting-started). Note that there is a [processesscraper (`system.processes.*` metrics)](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/hostmetricsreceiver/internal/scraper/processesscraper/documentation.md) and a [processscraper (`process.*` metrics)](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/hostmetricsreceiver/internal/scraper/processscraper/documentation.md) with separate options. An example configuration would look like this:
+```shell
+newrelic/nrdot-collector --config /etc/nrdot-collector/config.yaml \
+--config='yaml:receivers::hostmetrics::scrapers::processes: ' \
+--config='yaml:receivers::hostmetrics::scrapers::process: { metrics: { process.cpu.utilization: { enabled: true }, process.cpu.time: { enabled: false } } }'
+```
+
+## Troubleshooting
+
+Please refer to our [troubleshooting guide](./TROUBLESHOOTING.md).
diff --git a/distributions/nrdot-collector/TROUBLESHOOTING.md b/distributions/nrdot-collector/TROUBLESHOOTING.md
@@ -0,0 +1,68 @@
+# Troubleshooting for nrdot-collector
+
+For general NRDOT troubleshooting, see [this guide](../TROUBLESHOOTING.md). This document assumes you are familiar with
+the troubleshooting tools mentioned.
+
+## Known issues
+
+### Missing host entity in New Relic UI due to missing `host.id`
+If you are [seeing telemetry getting ingested into New Relic](../TROUBLESHOOTING.md#user-content-stablelink-telemetry-not-reaching-new-relic) but even after a few minutes of waiting the Host UI does not show any host entities, you might be running into the limitations of the [resourcedetectionprocessor](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/resourcedetectionprocessor/README.md) NRDOT uses to determine the `host.id` attribute required to [synthesize a host entity](https://github.com/newrelic/entity-definitions/blob/main/entity-types/infra-host/definition.yml#L62-L63). An example log message indicating this issue looks like this:
+```
+# example 1
+2025-01-01T22:49:09.110Z        warn    system/system.go:143    failed to get host ID   {"otelcol.component.id": "resourcedetection", "otelcol.component.kind" : "Processor", "otelcol.pipeline.id": "logs/host", "otelcol.signal": "logs", "error": "failed to obtain \"host.id\": error detecting resource: host id not found in: /etc/machine-id or /var/lib/dbus/machine-id"}
+# example 2
+2025-01-01T23:07:27.866Z        warn    system/system.go:143    failed to get host ID   {"otelcol.component.id": "resourcedetection", "otelcol.component.kind": "Processor", "otelcol.pipeline.id": "metrics/host", "otelcol.signal": "metrics", "error": "empty \"host.id\""}
+```
+In order to resolve this, you can set the `host.id` attributes manually via the [environment variable](./README.md#configuration) `OTEL_RESOURCE_ATTRIBUTES`, e.g. `export OTEL_RESOURCE_ATTRIBUTES='host.id=my-custom-host-id'`.
+
+### No `root_path` in containerized environments
+The `hostmetricsreceiver` auto-detects the files to scrape system metrics from. When running in a container, this causes issues as the receiver would then scrape metrics of the container instead of the host system which you most likely want to monitor. In order to bridge this gap, the receiver provides the `root_path` option which allows you to specify the path where the host file system is available to the collector, most commonly by mounting it into the container. The warning indicating this issue looks like this:
+```
+2025-01-01T21:08:21.097Z	warn	filesystemscraper/factory.go:48	No `root_path` config set when running in docker environment, will report container filesystem stats. See https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver#collecting-host-metrics-from-inside-a-container-linux-only	{"otelcol.component.id": "hostmetrics", "otelcol.component.kind": "Receiver", "otelcol.signal": "metrics"}
+```
+In order to resolve this, make sure to follow the [receiver's docs](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/hostmetricsreceiver/README.md#collecting-host-metrics-from-inside-a-container-linux-only) to mount the host file system into the container at the `root_path` and configure the `root_path` accordingly, e.g.
+```bash
+docker run -v /:/hostfs \
+-e NEW_RELIC_LICENSE_KEY='license-key' newrelic/nrdot-collector \
+--config /etc/nrdot-collector/config.yaml \
+--config 'yaml:receivers::hostmetrics::root_path: /hostfs'
+```
+
+## Gateway mode specific issues
+
+### High memory usage in gateway deployments
+When running in gateway mode with high throughput, the collector may experience elevated memory usage. This is typically due to buffering and batching of telemetry data from multiple sources. To mitigate this:
+
+1. **Adjust batch processor settings**: Configure the batch processor with appropriate timeout and batch size limits:
+```bash
+newrelic/nrdot-collector \
+--config /etc/nrdot-collector/config.yaml \
+--config 'yaml:processors::batch::timeout: 5s' \
+--config 'yaml:processors::batch::send_batch_size: 1000'
+```
+
+2. **Configure memory limiter**: Use the memory limiter processor to prevent out-of-memory conditions:
+```bash
+newrelic/nrdot-collector \
+--config /etc/nrdot-collector/config.yaml \
+--config 'yaml:processors::memory_limiter::limit_mib: 512' \
+--config 'yaml:processors::memory_limiter::spike_limit_mib: 128'
+```
+
+### Connection issues from remote collectors
+When using the collector in gateway mode, remote collectors may have trouble connecting. Common causes include:
+
+1. **Firewall or network policies**: Ensure the OTLP receiver ports (default: 4317 for gRPC, 4318 for HTTP) are accessible from remote collectors.
+
+2. **TLS configuration**: If using TLS, ensure certificates are properly configured and trusted by remote collectors.
+
+3. **Incorrect endpoint configuration**: Verify remote collectors are configured with the correct endpoint URL and protocol (gRPC vs HTTP).
+
+### Load balancing considerations
+For high-availability gateway deployments with multiple collector instances:
+
+1. **Use consistent hashing**: When load balancing, use consistent hashing based on resource attributes to ensure related telemetry data is routed to the same collector instance.
+
+2. **Configure health checks**: Set up appropriate health check endpoints for load balancers to detect unhealthy collector instances.
+
+3. **Monitor queue sizes**: Keep an eye on internal queue sizes to detect backpressure issues early.