|
1 | | -# Profiling metrics connector |
| 1 | +# Profiling Metrics Connector |
2 | 2 |
|
3 | | -The Profiling metrics connector is an opinionated OTel connector that generates OTel metrics from selected OTel profiling data. |
| 3 | +<!-- status autogenerated section --> |
| 4 | +| Status | | |
| 5 | +| ------------- |-----------| |
| 6 | +| Distributions | [] | |
| 7 | +| Issues | [](https://github.com/elastic/opentelemetry-collector-components/issues?q=is%3Aopen+is%3Aissue+label%3Aconnector%2Fprofilingmetrics) [](https://github.com/elastic/opentelemetry-collector-components/issues?q=is%3Aclosed+is%3Aissue+label%3Aconnector%2Fprofilingmetrics) | |
| 8 | +| Code coverage | [](https://app.codecov.io/gh/elastic/opentelemetry-collector-components/tree/main/?components%5B0%5D=connector_profilingmetrics&displayType=list) | |
| 9 | + |
| 10 | +[development]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/component-stability.md#development |
| 11 | + |
| 12 | +## Supported Pipeline Types |
| 13 | + |
| 14 | +| [Exporter Pipeline Type] | [Receiver Pipeline Type] | [Stability Level] | |
| 15 | +| ------------------------ | ------------------------ | ----------------- | |
| 16 | +| profiles | metrics | [development] | |
| 17 | + |
| 18 | +[Exporter Pipeline Type]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/connector/README.md#exporter-pipeline-type |
| 19 | +[Receiver Pipeline Type]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/connector/README.md#receiver-pipeline-type |
| 20 | +[Stability Level]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/component-stability.md#stability-levels |
| 21 | +<!-- end autogenerated section --> |
| 22 | + |
| 23 | +## Overview |
| 24 | + |
| 25 | +The Profiling Metrics connector is an opinionated OpenTelemetry connector that |
| 26 | +transforms [OTel Profiles data](https://opentelemetry.io/docs/specs/otel/profiles/) |
| 27 | +into OpenTelemetry metrics. It analyzes stack traces from Profiles samples and |
| 28 | +produces per-resource delta metrics that break down exclusive CPU time by frame type |
| 29 | +(kernel, native, JVM, Go, Python, etc.), shared library, system call, kernel subsystem, and |
| 30 | +more. |
| 31 | + |
| 32 | +These metrics are designed to power the |
| 33 | +[Elastic OTel Profiling Metrics integration](https://www.elastic.co/docs/reference/integrations/profilingmetrics_otel) |
| 34 | +dashboards, giving you an at-a-glance view of where your application spends |
| 35 | +its time without requiring you to query raw profiling data. |
| 36 | + |
| 37 | +### What it does |
| 38 | + |
| 39 | +For every batch of Profiles data the connector receives, it walks each |
| 40 | +sample's stack trace and: |
| 41 | + |
| 42 | +1. **Classifies the leaf frame** into one of the supported runtime/frame types |
| 43 | + (kernel, native/C, JVM, Go, Python, Ruby, PHP, Perl, .NET, Rust, Beam, V8 |
| 44 | + JS) and increments the corresponding `samples.<type>.count` metric. |
| 45 | +2. **Counts userspace vs. kernel frames** (`samples.user.count` and |
| 46 | + `samples.kernel.count`) so downstream consumers can compare and compute the total sample |
| 47 | + count as their sum. |
| 48 | +3. **Extracts shared library names** for native frames and attaches them as the |
| 49 | + `shlib_name` attribute on `samples.native.count`. |
| 50 | +4. **Classifies kernel stack traces** into subsystem categories |
| 51 | + (network/tcp, network/udp, ipc, disk, memory, synchronization) with |
| 52 | + read/write direction and protocol breakdown, exposed via `kernel_area`, |
| 53 | + `kernel_proto`, and `kernel_io` attributes on `samples.kernel.count`. |
| 54 | +5. **Extracts system call names** from kernel frames (e.g. `write`, `read`, |
| 55 | + `futex`) and attaches them as the `syscall_name` attribute on |
| 56 | + `samples.kernel.count`, enabling per-syscall analysis. |
| 57 | +6. **(Optional) Generates `samples.frame_type`** — a gauge that counts profiling |
| 58 | + frames grouped by their `frame_type` attribute. |
| 59 | +7. **(Optional) Generates `samples.classification`** — a gauge with |
| 60 | + language-specific classification (e.g. Go package or JVM class) for Go |
| 61 | + and JVM frames. |
| 62 | +8. **Supports custom aggregations** — user-defined regex patterns matched |
| 63 | + against function names, producing `samples.custom_aggregation` metrics with |
| 64 | + a user-chosen label. |
| 65 | + |
| 66 | +### How it works |
| 67 | + |
| 68 | +``` |
| 69 | +┌────────────┐ ┌───────────────────────┐ ┌──────────────┐ |
| 70 | +│ Profiles │──────▶│ profilingmetrics │──────▶│ Metrics │ |
| 71 | +│ Pipeline │ │ connector │ │ Pipeline │ |
| 72 | +└────────────┘ └───────────────────────┘ └──────────────┘ |
| 73 | +``` |
| 74 | + |
| 75 | +The connector sits between a **Profiles** exporter pipeline and a **Metrics** |
| 76 | +receiver pipeline. It consumes `pprofile.Profiles`, walks stack frames using |
| 77 | +the shared dictionary (string table, location table, function table, mapping |
| 78 | +table), and emits `pmetric.Metrics` to the next consumer. |
| 79 | + |
| 80 | +When `flush_interval` is greater than `0s` (default: `30s`), an internal |
| 81 | +aggregation consumer buffers and merges delta metrics in memory, flushing them |
| 82 | +collectively at each interval. This reduces metric volume and aligns data |
| 83 | +points to regular time boundaries. |
| 84 | + |
| 85 | +### Use cases |
| 86 | + |
| 87 | +- **Profiling cost reduction**: distill high-volume profiling data into |
| 88 | + compact, queryable metrics also amenable to LLM processing. |
| 89 | +- **Infrastructure dashboards**: visualize CPU time distribution across |
| 90 | + runtimes, kernel subsystems, and shared libraries. |
| 91 | +- **Alerting**: set threshold alerts on kernel synchronization time, network |
| 92 | + I/O, or specific function patterns via custom aggregations. |
| 93 | + |
| 94 | +## Requirements |
| 95 | + |
| 96 | +- An OpenTelemetry Collector build that includes the `profilingmetrics` |
| 97 | + connector. The connector is shipped as part of the |
| 98 | + [Elastic Distribution of the OpenTelemetry Collector (EDOT)](https://www.elastic.co/docs/reference/opentelemetry). |
| 99 | +- A profiling data source sending OTel profiles to the collector (e.g. the |
| 100 | + [OpenTelemetry eBPF profiler](https://github.com/open-telemetry/opentelemetry-ebpf-profiler)). |
4 | 101 |
|
5 | 102 | ## Configuration |
6 | 103 |
|
7 | | -Any [generated metric](./metadata.yaml) can be disabled through the configuration. For example: |
| 104 | +A minimal configuration that converts profiling data into metrics and exports |
| 105 | +them over OTLP: |
| 106 | + |
| 107 | +```yaml |
| 108 | +receivers: |
| 109 | + profiling: |
| 110 | + |
| 111 | +connectors: |
| 112 | + profilingmetrics: |
8 | 113 |
|
| 114 | +exporters: |
| 115 | + otlphttp: |
| 116 | + endpoint: https://my-backend:4318 |
| 117 | + |
| 118 | +service: |
| 119 | + pipelines: |
| 120 | + profiles: |
| 121 | + receivers: [profiling] |
| 122 | + exporters: [profilingmetrics] |
| 123 | + metrics: |
| 124 | + receivers: [profilingmetrics] |
| 125 | + exporters: [otlphttp] |
9 | 126 | ``` |
10 | | -metrics: |
11 | | - samples.classification: |
12 | | - enabled: false |
13 | | - samples.dotnet.count: |
14 | | - enabled: false |
| 127 | +
|
| 128 | +### Full configuration reference |
| 129 | +
|
| 130 | +The following settings can be configured: |
| 131 | +
|
| 132 | +```yaml |
| 133 | +connectors: |
| 134 | + profilingmetrics: |
| 135 | + # Time window for aggregating delta metrics in memory before flushing. |
| 136 | + # Set to 0s to disable aggregation and forward metrics immediately. |
| 137 | + # Default: 30s |
| 138 | + flush_interval: 30s |
| 139 | + |
| 140 | + # Toggle individual metrics on or off. |
| 141 | + # See the "Metrics" section below for the full list. |
| 142 | + metrics: |
| 143 | + samples.user.count: |
| 144 | + enabled: true |
| 145 | + samples.kernel.count: |
| 146 | + enabled: true |
| 147 | + samples.native.count: |
| 148 | + enabled: true |
| 149 | + samples.go.count: |
| 150 | + enabled: true |
| 151 | + samples.jvm.count: |
| 152 | + enabled: true |
| 153 | + samples.cpython.count: |
| 154 | + enabled: true |
| 155 | + samples.dotnet.count: |
| 156 | + enabled: true |
| 157 | + samples.ruby.count: |
| 158 | + enabled: true |
| 159 | + samples.php.count: |
| 160 | + enabled: true |
| 161 | + samples.perl.count: |
| 162 | + enabled: true |
| 163 | + samples.v8js.count: |
| 164 | + enabled: true |
| 165 | + samples.rust.count: |
| 166 | + enabled: true |
| 167 | + samples.beam.count: |
| 168 | + enabled: true |
| 169 | + # Disabled by default — enable explicitly if needed: |
| 170 | + samples.frame_type: |
| 171 | + enabled: false |
| 172 | + samples.classification: |
| 173 | + enabled: false |
| 174 | + |
| 175 | + # Custom aggregations let you define regex patterns matched against |
| 176 | + # function names in stack traces. Each match increments a |
| 177 | + # samples.custom_aggregation metric with the given label. |
| 178 | + aggregations: |
| 179 | + - match: "^com\\.example\\.payments\\." |
| 180 | + label: "payments" |
| 181 | + - match: "^com\\.example\\.auth\\." |
| 182 | + label: "authentication" |
15 | 183 | ``` |
16 | 184 |
|
17 | | -**⚠️ Configuration Warning: Metric Dependencies** |
| 185 | +| Setting | Type | Default | Description | |
| 186 | +| ----------------- | ---------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
| 187 | +| `flush_interval` | `duration` | `30s` | Time window for aggregating delta metrics before flushing. Set to `0s` to disable aggregation and forward metrics on every received profile. | |
| 188 | +| `metrics` | `object` | — | Per-metric toggle. Each key is a metric name (see below) with an `enabled` boolean. Unspecified metrics use their default. | |
| 189 | +| `aggregations` | `list` | `[]` | List of custom aggregation rules. Each entry has a `match` (regex applied to function names) and a `label` (value used in the `frame_type` attribute of the output metric). | |
| 190 | + |
| 191 | +### Metric dependencies warning |
| 192 | + |
| 193 | +To ensure data integrity and accurate ratio calculations: |
| 194 | + |
| 195 | +- **Required combination**: `samples.kernel.count` and `samples.user.count` |
| 196 | + must both be enabled. Their sum is the only reliable way to compute the total |
| 197 | + sample count. |
| 198 | +- **Frame metrics**: avoid disabling specific frame metrics like |
| 199 | + `samples.native.count`. Disabling them results in a loss of information |
| 200 | + regarding shared libraries or runtime breakdown. |
| 201 | + |
| 202 | +## Metrics |
| 203 | + |
| 204 | +The full list of emitted metrics is documented in [documentation.md](./documentation.md). |
| 205 | + |
| 206 | +Any metric can be toggled via the `metrics` configuration block (see above). |
18 | 207 |
|
19 | | -To ensure data integrity and accurate ratio calculations, adhere to the following rules: |
20 | | - - Required Combination: You must enable `samples.kernel.count` and `samples.user.count`. Their sum is the only reliable way to calculate the total sample count. |
21 | | - - Frame metrics: Avoid disabling specific frame metrics like `samples.native.count`. Disabling these results in a loss of information regarding shared libraries. |
| 208 | +## Elastic integration |
22 | 209 |
|
| 210 | +The metrics produced by this connector are designed to be consumed by the |
| 211 | +[**Elastic OTel Profiling Metrics integration**](https://www.elastic.co/docs/reference/integrations/profilingmetrics_otel). |
| 212 | +Refer to that page for dashboard setup and field mappings. |
23 | 213 |
|
24 | | -[Quickstart guide](https://www.elastic.co/docs/reference/edot-collector/config/configure-profiles-collection) to use this connector as part of [EDOT](https://www.elastic.co/docs/reference/opentelemetry). |
| 214 | +For a quickstart guide on using this connector as part of the Elastic |
| 215 | +Distribution of the OpenTelemetry Collector (EDOT), see the |
| 216 | +[EDOT profiling collection guide](https://www.elastic.co/docs/reference/edot-collector/config/configure-profiles-collection). |
0 commit comments