Skip to content

Commit 83acc5a

Browse files
[n8n] Fix metric mappings and add full v2 metric coverage (#23635) (#23733)
* Fix n8n metric mappings and add full v2 metric coverage - Drop fabricated metric names that n8n never emitted; map only what is empirically present. - Add the n8n 2.x metric families: workflow.execution.duration histogram, audit.workflow.*, embed.login.*, token.exchange.*, process.pss.bytes, runner.task.requested, and the workflow_statistics gauges. - Add worker-only families (node.started, node.finished, queue.job.dequeued, runner.task.requested) by introducing a worker-scrape instance. - Stop gating the OpenMetrics scrape on /healthz/readiness; emit n8n.readiness.check unconditionally so metrics still flow when the readiness endpoint is unhealthy. - Replace the custom Dockerfile with a direct n8nio/n8n image reference and parameterise the version via hatch.toml so the test matrix can run against both 1.118.1 and 2.19.5. - Allocate free host ports via datadog_checks.dev.utils.find_free_ports and forward them through docker_run env_vars to avoid port collisions on re-runs. * Add changelog for PR #23635 * Refine n8n metric coverage and e2e setup * Document raw_metric_prefix requirement when customizing N8N_METRICS_PREFIX * Reformat changelog so towncrier renders sub-bullets correctly * Add tests/lab traffic generator for n8n A long-running n8n simulation that layers on top of the integration test environment so a real Datadog Agent can ship metrics to a Datadog org for dashboard / monitor iteration. - tests/lab/workflows/: five lab-only workflow JSONs covering distinct shapes (fast, slow Wait node, always-fail Code, flaky 30%, four-step chain). - tests/lab/traffic_generator.py: click CLI (start/generate/stop) that runs ddev env start --base, copies + imports + activates the lab workflows, restarts n8n, and drives a configurable async traffic mix against the webhooks and REST API. - tests/lab/config.yaml: webhook + REST probabilities and tick / reload intervals; hot-reloaded while the generator runs. - tests/lab/.ddev.toml: pins the lab to an `n8nlab` ddev org. - tests/lab/run_lab.sh: bash entrypoint with an EXIT trap so Ctrl+C always runs lab:stop. - hatch.toml: new [envs.lab] env with click/httpx/pyyaml/rich and start/generate/stop scripts. * Add missing n8n event metric mappings * Add VM-isolated expression engine metrics (n8n 2.x) n8n 2.x ships a new VM-isolated expression engine in @n8n/expression-runtime that registers its own Prometheus metrics under the n8n_expression_* prefix. The metrics are gated on N8N_EXPRESSION_ENGINE=vm and N8N_EXPRESSION_ENGINE_OBSERVABILITY_ENABLED=true, neither of which defaults to on, so live containers do not emit samples unless explicitly opted in. Map the family in datadog_checks/n8n/metrics.py, add metadata.csv rows with the version + flag requirements documented, add synthetic samples to the unit fixtures so check_symmetric_inclusion stays green, list the metrics in the V2_ONLY / RARE_EVENT sets used by the integration assertions, and call out the env vars in the README's version-specific block. * Split lab into its own compose, mount workflows by bind The lab previously shared tests/docker/docker-compose.yaml with the test env and drove workflow import through docker exec in traffic_generator.py. That coupled two consumers with different port expectations (the test env uses find_free_ports for parallel safety; the lab needs a fixed URL for docs, agent config, and the traffic generator) and put workflow lifecycle in two places. Add tests/lab/docker-compose.yaml that hardcodes 5678/5680 and bind-mounts both the test fixtures and the lab workflows under /workflows/. Gate the compose + port selection in tests/common.py on N8N_IS_LAB so the same conftest serves both modes. Move workflow import/activate into conftest (scanning the bind-mount, reading stable ids from JSON), and add a lab-only logs block + docker_volumes yield so the Datadog Agent picks up n8n stdout via autodiscovery and the event-bus log files via the data volume. Drop the docker-exec workflow import from traffic_generator.py now that conftest owns it. Update the README log-collection section to reflect that event-bus logs live under the n8n user folder rather than N8N_LOG_FILE_LOCATION. * Address PR review feedback - Tighten _generate_workflow_traffic success check to == 200 so a webhook that responds 4xx (e.g. not yet registered after restart) does not falsely count as a healthy workflow run; capture last_status / last_exc and surface them in the RuntimeError so CI failures point at the real cause. - Replace the bespoke time.monotonic() wait loops with WaitFor + raise predicates (the dominant pattern across integrations-core). Restructure dd_environment conditions so the docker_run condition chain runs: wait for /healthz, activate workflows, then assert /metrics reachable on main + worker. Workflow-started wait stays inline since _generate_workflow_traffic is not idempotent. - Drop drop_rare_event_metrics; pass the public exclude= parameter to assert_metrics_using_metadata so we don't reach into AggregatorStub internals. - Replace bare try/except RequestException: pass blocks with contextlib.suppress. - Parametrize the two unit-fixture metadata tests; add the missing pytestmark = pytest.mark.unit and a comment explaining why the unit assertion is version-pinned to major=2. - Re-word the lab traffic generator reload-failure messages so it's clear the lab keeps running with the previous config. - Add N8N_METRICS_INCLUDE_WORKFLOW_EXECUTION_DURATION to the README's version-specific block and to the changelog flag list; indent the changelog sub-bullets so towncrier nests them under the wrapping bullet. * Fix e2e test referencing removed drop_rare_event_metrics helper Use the public exclude= parameter on assert_metrics_using_metadata, matching test_integration.py. test_e2e.py was missed in the earlier review-feedback commit. * Proofread n8n README against the Datadog style guide - Remove stray scratch notes accidentally committed at the end of the file (numbered questions and a changelog-process note that didn't belong in the public README). - Sentence-case the 'Data collected' and 'Service checks' headings. - Replace hyphen-as-em-dash usage (' - ') by splitting into separate sentences. - Replace slash-as-and/or in lists and tag descriptions: 'enqueued/dequeued/completed/failed/stalled counters' -> spelled-out list; 'result:success/failure' -> 'result:success or result:failure'; 'stdout/stderr' -> 'stdout and stderr'. * Move workflow setup back into docker_run conditions to fix e2e In the previous refactor _generate_workflow_traffic and the _workflow_started_non_zero wait were moved into the body of the dd_environment context manager. That made them vulnerable to fixture re-invocation paths (e.g. session teardown or flaky-plugin retry) that fired the body code against torn-down containers, producing a setup error after the e2e test had already passed. Put both back into conditions=[...]. That keeps them inside docker_run's set_up() retry envelope (attempts=2 in CI), and they are no longer exposed to the post-yield teardown path. The post-restart /healthz wait moves back inside _activate_imported_workflows so the function stays self-contained as a condition. Restore the (instances, E2E_METADATA) tuple yield for non-lab mode so the e2e Agent container still gets the docker_volumes mount it expects. * Address second-round PR review feedback - conftest.py: parse n8n_workflow_started_total samples as floats instead of string-matching ' 0', so '0.0' / '0e+0' counter values are not treated as non-zero and OpenMetrics '# HELP'/'# TYPE' comment lines that share the prefix are skipped. - common.py: collapse the get_all_metadata_metrics passthrough into get_metadata_metrics_for_version (update integration + e2e call sites) and document the intentional V2_ONLY / RARE_EVENT overlap so future contributors do not assume the duplication is accidental. - check.py: cache the readiness endpoint with functools.cached_property (it is derived from immutable config) and parameterise the dict return / argument types as dict[str, Any]. - traffic_generator.py: scope the asyncio.Event and current config to _run_traffic instead of holding them at module level, threading both through _config_reloader. Switches the SIGINT/SIGTERM hook to loop.add_signal_handler so a second 'generate' invocation in the same process starts from a clean state. * Address third-round PR review feedback - conftest.py: move the worker CheckEndpoints to after _activate_imported_workflows so any cascade from the n8n main restart is caught before downstream conditions scrape the worker. - test_unit.py: import the requests module and reference requests.ConnectionError at the call site so the builtin ConnectionError name is not shadowed for the rest of the module. - traffic_generator.py: extract _make_output_table() so the table schema lives in one place and _print_row() only owns row data. * Wait for webhook registration after n8n restart on v2 On n8n 2.x, /healthz comes back after `docker compose restart n8n` before n8n has finished re-registering the active workflows' webhook routes. The existing WaitFor(_n8n_healthy) inside _activate_imported_workflows was satisfied while /webhook/test still returned 404, so _generate_workflow_traffic raced the registration and failed with last_status=404. Add a second WaitFor poll on the integration-test webhook itself so the registration is observed before downstream conditions run. v1 happens to register fast enough that the gap is not observable there, but the extra check costs at most one poll on the happy path. * Map n8n event-bus dynamic counters Map the broader n8n event-bus surface (~45 dynamic counters) covering audit (user, credentials, package, variable, execution data), AI node, runner, and workflow cancellation events, plus execution throttling. Counter names rejected by n8n's own prom-client validation (hyphenated families such as external-secrets, token-exchange, role-mapping, and cluster) are intentionally not mapped and called out in metrics.py. The integration test environment cannot realistically exercise these families end to end, so each new metric is documented as best-effort in metadata.csv and added to RARE_EVENT_METRIC_NAMES. The unit fixture carries synthetic samples so the metric map stays validated. README covers the dynamic-counter scope and shows an extra_metrics example for users to add events from future n8n releases. * Drop technical hyphen-rejection paragraph from n8n README * Tighten n8n changelog to one-line themes * Tone down n8n changelog lead-in * Reframe n8n changelog from user perspective * Treat any 2xx response as ready, bump n8n to a major release The readiness gauge now reports 1 for any HTTP 2xx response on /healthz/readiness, not only 200. Rename the changelog entry from .added to .changed so the next release is a major bump, reflecting the breadth of the integration overhaul. (cherry picked from commit 57659b4) Co-authored-by: Juanpe Araque <juanpedro.araque@datadoghq.com>
1 parent ff792eb commit 83acc5a

30 files changed

Lines changed: 2817 additions & 857 deletions

n8n/README.md

Lines changed: 106 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@
22

33
## Overview
44

5-
This check monitors [n8n][1] through the Datadog Agent.
5+
This check monitors [n8n][1] through the Datadog Agent.
66

77
Collect n8n metrics including:
8-
- Cache metrics: Hit and miss statistics.
9-
- Message event bus metrics: Event-related metrics.
10-
- Workflow metrics: Can include workflow ID labels.
11-
- Node metrics: Can include node type labels.
12-
- Credential metrics: Can include credential type labels.
13-
- Queue metrics
8+
- Cache metrics: hit, miss, and update counts.
9+
- Workflow metrics: started, success, failed counters, audit workflow lifecycle counters; in n8n 2.x, an execution-duration histogram.
10+
- Node metrics: per-node started and finished counters emitted by worker processes in queue mode.
11+
- Queue metrics: queue depth; enqueued, dequeued, completed, failed, and stalled counters; and scaling-mode worker gauges.
12+
- HTTP metrics: request duration histograms tagged with status code.
13+
- Process and Node.js runtime metrics.
1414

1515

1616
## Setup
@@ -40,13 +40,79 @@ N8N_METRICS_INCLUDE_CACHE_METRICS=true
4040
N8N_METRICS_INCLUDE_MESSAGE_EVENT_BUS_METRICS=true
4141
N8N_METRICS_INCLUDE_WORKFLOW_ID_LABEL=true
4242
N8N_METRICS_INCLUDE_API_ENDPOINTS=true
43+
N8N_METRICS_INCLUDE_QUEUE_METRICS=true
44+
45+
# Optional: n8n 2.x adds workflow_statistics gauges (workflows, users, executions, ...) - opt in
46+
N8N_METRICS_INCLUDE_WORKFLOW_STATISTICS=true
4347

4448
# Optional: Customize the metric prefix (default is 'n8n_')
4549
N8N_METRICS_PREFIX=n8n_
4650
```
4751

4852
For more details, see the n8n documentation on [enabling Prometheus metrics][10].
4953

54+
If you change `N8N_METRICS_PREFIX` from its default of `n8n_`, you **must** also set `raw_metric_prefix` in the integration's `conf.yaml` to the same value. Otherwise the check will not recognize the exposed metric names and will silently submit nothing:
55+
56+
```yaml
57+
instances:
58+
- openmetrics_endpoint: http://localhost:5678/metrics
59+
raw_metric_prefix: my_custom_prefix_
60+
```
61+
62+
#### Event-driven counters
63+
64+
Most n8n counters are registered dynamically the first time their underlying event fires. The integration ships mappings for around 70 of these event-bus counters, including:
65+
66+
- Workflow lifecycle: `n8n.workflow.started.count`, `n8n.workflow.success.count`, `n8n.workflow.failed.count`, `n8n.workflow.cancelled.count`
67+
- Audit (workflow, user, credentials, package, variable, execution data): `n8n.audit.workflow.executed.count`, `n8n.audit.user.login.success.count`, `n8n.audit.user.credentials.created.count`, and similar
68+
- AI nodes: `n8n.ai.tool.called.count`, `n8n.ai.llm.generated.count`, `n8n.ai.vector.store.searched.count`, and similar
69+
- Runner, queue, and node lifecycle: `n8n.runner.task.requested.count`, `n8n.queue.job.completed.count`, `n8n.node.started.count`, `n8n.node.finished.count`
70+
71+
These counters do not appear on the `/metrics` endpoint until the corresponding event has occurred. A healthy idle deployment will not produce data points for them until that activity fires. The complete list is in [`metadata.csv`][7].
72+
73+
If a future n8n release exposes a new event-driven counter that is not yet covered by this integration, add it to the `extra_metrics` option in your instance configuration:
74+
75+
```yaml
76+
instances:
77+
- openmetrics_endpoint: http://n8n:5678/metrics
78+
extra_metrics:
79+
- some_new_n8n_event_total: some.new.n8n.event
80+
```
81+
82+
The left-hand side is the Prometheus counter name as n8n exposes it (keep the `_total` suffix). The right-hand side is the dotted Datadog metric name to submit it as.
83+
84+
#### Queue mode and workers
85+
86+
In queue mode, n8n runs separate worker processes that execute jobs picked up from a Redis-backed queue. Each worker exposes its own `/metrics` endpoint and emits a different subset of metrics than the main process. Worker-observed metrics include `n8n.queue.job.dequeued.count`, `n8n.queue.job.stalled.count`, `n8n.node.started.count`, `n8n.node.finished.count`, and `n8n.runner.task.requested.count`. Main-only metrics include `n8n.instance.role.leader` and the `n8n.scaling.mode.queue.jobs.*` family.
87+
88+
To expose worker metrics, set `QUEUE_HEALTH_CHECK_ACTIVE=true` and `QUEUE_HEALTH_CHECK_PORT=<port>` on each worker. **In n8n 2.x, port `5679` is reserved for the task runner broker, so pick a different port (for example `5680`).**
89+
90+
For full coverage in queue deployments, configure one Datadog instance per n8n process exposing `/metrics`, including main and worker processes:
91+
92+
```yaml
93+
instances:
94+
- openmetrics_endpoint: http://n8n-main:5678/metrics
95+
- openmetrics_endpoint: http://n8n-worker:5680/metrics
96+
```
97+
98+
#### Version-specific metrics
99+
100+
Several metric families were introduced in n8n 2.x and are not emitted on n8n 1.x:
101+
102+
- `n8n.workflow.execution.duration.seconds.*` (histogram). Gated by `N8N_METRICS_INCLUDE_WORKFLOW_EXECUTION_DURATION`, which defaults to `true` in n8n 2.x.
103+
- `n8n.audit.workflow.activated.count`, `n8n.audit.workflow.deactivated.count`, `n8n.audit.workflow.executed.count`, `n8n.audit.workflow.resumed.count`, `n8n.audit.workflow.version.updated.count`, and `n8n.audit.workflow.waiting.count`
104+
- `n8n.embed.login.requests.count` (tagged with `result:success` or `result:failure`), `n8n.embed.login.failures.count` (tagged with `reason`)
105+
- `n8n.token.exchange.requests.count` (tagged with `result:success` or `result:failure`), `n8n.token.exchange.failures.count` (tagged with `reason`), `n8n.token.exchange.identity.linked.count`, `n8n.token.exchange.jit.provisioning.count`
106+
- `n8n.process.pss.bytes` (Linux only)
107+
- The `n8n.{production,manual,production.root}.executions`, `n8n.users.total`, `n8n.enabled.users`, `n8n.workflows.total`, and `n8n.credentials.total` family. Only emitted when `N8N_METRICS_INCLUDE_WORKFLOW_STATISTICS=true` is set.
108+
- The `n8n.expression.*` family (`evaluation.duration.seconds`, `code.cache.{hit,miss,eviction,size}`, `pool.{acquired,replenish.failed,scaled.up,scaled.to.zero}`). Only emitted when n8n is running the new VM-isolated expression engine *and* observability for it is on. Set `N8N_EXPRESSION_ENGINE=vm` and `N8N_EXPRESSION_ENGINE_OBSERVABILITY_ENABLED=true` on the n8n process; both default to off (the engine defaults to `legacy`). These metrics surface the per-expression evaluation latency, the compiled-expression LRU cache hit and miss rates, and the V8-isolate pool's idle scaling behavior. They are most useful for troubleshooting workflow latency that traces back to slow `{{ ... }}` evaluation.
109+
110+
Some metrics only emit samples after the corresponding runtime event occurs. For example, failures-only counters (`*.failures.count`) need an authentication failure, audit workflow counters need the matching workflow state transition, and the libuv `n8n.nodejs.active.requests` gauge needs an in-flight libuv request. A healthy idle deployment may not produce data points for these metrics until that activity occurs.
111+
112+
#### Tag cardinality
113+
114+
When `N8N_METRICS_INCLUDE_WORKFLOW_ID_LABEL=true`, http and workflow execution histograms are tagged with `workflow_id` (and similar labels for nodes). On deployments with many distinct workflows or nodes, this can produce high-cardinality metrics. Drop the label via `exclude_labels` or omit `N8N_METRICS_INCLUDE_WORKFLOW_ID_LABEL` to keep tag cardinality bounded.
115+
50116
#### Configure the Datadog Agent
51117

52118
1. Edit the `n8n.d/conf.yaml` file, in the `conf.d/` folder at the root of your Agent's configuration directory to start collecting your n8n performance data. See the [sample n8n.d/conf.yaml][4] for all available configuration options.
@@ -59,27 +125,32 @@ _Available for Agent versions >6.0_
59125

60126
#### Enable n8n logging
61127

62-
Configure n8n to output logs by setting the following environment variables:
128+
Configure n8n application logs by setting the following environment variables:
63129

64130
```bash
65131
# Set the log level (error, warn, info, debug)
66132
N8N_LOG_LEVEL=info
67133
68-
# Output logs to console (for containerized environments) or file
134+
# Output application logs to console or file
69135
N8N_LOG_OUTPUT=console
70136
71-
# If using file output, specify the log file location
137+
# Use JSON formatting so Datadog can parse n8n application log attributes
138+
N8N_LOG_FORMAT=json
139+
140+
# If using file output, specify the application log file location
72141
N8N_LOG_FILE_LOCATION=/var/log/n8n/n8n.log
73142
```
74143

75144
#### Structured event logs
76145

77-
n8n can output structured JSON logs to `n8nEventLog.log` containing detailed workflow execution events. Enable this by setting the log output to file:
146+
n8n also writes structured event bus logs to `n8nEventLog*.log`. These logs contain workflow, node, queue, runner, and audit events and are separate from the application logs controlled by `N8N_LOG_OUTPUT` and `N8N_LOG_FILE_LOCATION`.
78147

79-
```bash
80-
N8N_LOG_OUTPUT=file
81-
N8N_LOG_FILE_LOCATION=/var/log/n8n/
82-
```
148+
By default, event bus log files are written under the n8n user folder, for example:
149+
150+
- Host installations: `~/.n8n/n8nEventLog*.log`
151+
- Official Docker image: `/home/node/.n8n/n8nEventLog*.log`
152+
153+
If you use a custom n8n user folder, collect the event bus logs from that folder instead. If you customize the event bus log file base name with `N8N_EVENTBUS_LOGWRITER_LOGBASENAME`, update the Datadog log path to match.
83154

84155
The event log includes the following event types:
85156

@@ -102,32 +173,46 @@ Each event contains rich metadata including `executionId`, `workflowId`, `workfl
102173
logs_enabled: true
103174
```
104175

105-
2. Add this configuration block to your `n8n.d/conf.yaml` file to start collecting your n8n logs:
176+
2. Add log collection entries to your `n8n.d/conf.yaml` file.
177+
178+
For a host-based n8n installation where the Agent can read local files, collect the application log file and the event bus log files:
106179

107180
```yaml
108181
logs:
109182
- type: file
110183
path: /var/log/n8n/*.log
111184
source: n8n
112-
service: n8n
185+
service: <SERVICE>
186+
- type: file
187+
path: /home/n8n/.n8n/n8nEventLog*.log
188+
source: n8n
189+
service: <SERVICE>
113190
```
114191

115-
For containerized environments using Docker, use the following configuration instead:
192+
Adjust `/home/n8n/.n8n/n8nEventLog*.log` to the n8n user folder on your host.
193+
194+
For a containerized n8n deployment, collect stdout and stderr from the n8n container for application logs, and make the n8n user folder available to the Agent for event bus file logs. For example, if the n8n data directory is mounted on the host at `/var/lib/n8n`, configure:
116195

117196
```yaml
118197
logs:
119198
- type: docker
120199
source: n8n
121-
service: n8n
200+
service: <SERVICE>
201+
- type: file
202+
path: /var/lib/n8n/n8nEventLog*.log
203+
source: n8n
204+
service: <SERVICE>
122205
```
123206

207+
If the Agent runs in a container, mount the n8n data volume or host directory into the Agent container and use the path as seen from inside the Agent container.
208+
124209
3. [Restart the Agent][5].
125210

126211
### Validation
127212

128213
[Run the Agent's status subcommand][6] and look for `n8n` under the Checks section.
129214

130-
## Data Collected
215+
## Data collected
131216

132217
### Metrics
133218

@@ -137,7 +222,7 @@ See [metadata.csv][7] for a list of metrics provided by this integration.
137222

138223
The n8n integration does not include any events.
139224

140-
### Service Checks
225+
### Service checks
141226

142227
See [service_checks.json][8] for a list of service checks provided by this integration.
143228

n8n/assets/configuration/spec.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ files:
1212
openmetrics_endpoint.required: true
1313
openmetrics_endpoint.hidden: false
1414
openmetrics_endpoint.display_priority: 1
15-
openmetrics_endpoint.value.example: http://localhost:5678
15+
openmetrics_endpoint.value.example: http://localhost:5678/metrics
1616
openmetrics_endpoint.description: |
1717
Endpoint exposing the n8n's metrics in the OpenMetrics format. For more information, refer to:
1818
https://docs.n8n.io/hosting/logging-monitoring/monitoring/

n8n/changelog.d/23635.changed

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Improve the n8n metric coverage:
2+
3+
- Correct missing or incorrect metrics.
4+
- Add metrics introduced in n8n 2.x (workflow execution duration, audit events, authentication, workflow and user statistics, expression engine, and process memory).
5+
- Track n8n's dynamic events (workflow cancellations, audit activity, AI nodes, user and credential changes, package and variable changes).
6+
- Add support for monitoring n8n worker processes alongside the main process.

n8n/datadog_checks/n8n/check.py

Lines changed: 33 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -2,58 +2,55 @@
22
# All rights reserved
33
# Licensed under a 3-clause BSD style license (see LICENSE)
44

5-
from urllib.parse import urljoin
5+
from functools import cached_property
6+
from typing import Any
7+
from urllib.parse import urljoin, urlparse
8+
9+
from requests.exceptions import RequestException
610

711
from datadog_checks.base import OpenMetricsBaseCheckV2
812
from datadog_checks.n8n.metrics import METRIC_MAP, RENAME_LABELS_MAP
913

1014
from .config_models import ConfigMixin
1115

12-
DEFAULT_READY_ENDPOINT = '/healthz/readiness'
16+
DEFAULT_READY_PATH = '/healthz/readiness'
1317

1418

1519
class N8nCheck(OpenMetricsBaseCheckV2, ConfigMixin):
1620
__NAMESPACE__ = 'n8n'
1721
DEFAULT_METRIC_LIMIT = 0
1822

19-
def __init__(self, name, init_config, instances=None):
20-
super(N8nCheck, self).__init__(
21-
name,
22-
init_config,
23-
instances,
24-
)
25-
self.openmetrics_endpoint = self.instance["openmetrics_endpoint"]
26-
self.tags = self.instance.get('tags', [])
27-
self._ready_endpoint = DEFAULT_READY_ENDPOINT
28-
29-
def get_default_config(self):
23+
def get_default_config(self) -> dict[str, Any]:
3024
return {
3125
'metrics': [METRIC_MAP],
3226
'rename_labels': RENAME_LABELS_MAP,
3327
'raw_metric_prefix': 'n8n_',
3428
}
3529

36-
def _check_n8n_readiness(self):
37-
endpoint = urljoin(self.openmetrics_endpoint, self._ready_endpoint)
38-
response = self.http.get(endpoint)
39-
40-
# Determine metric value and status_code tag
41-
if response.status_code is None:
42-
self.log.warning("The readiness endpoint did not return a status code")
43-
metric_value = 0
44-
metric_tags = self.tags + ['status_code:null']
45-
elif response.status_code == 200:
46-
# Ready - submit 1
47-
metric_value = 1
48-
metric_tags = self.tags + [f'status_code:{response.status_code}']
49-
else:
50-
# Not ready - submit 0
51-
metric_value = 0
52-
metric_tags = self.tags + [f'status_code:{response.status_code}']
53-
54-
# Submit metric with appropriate value and status_code tag
55-
self.gauge('readiness.check', metric_value, tags=metric_tags)
56-
57-
def check(self, instance):
58-
super().check(instance)
30+
@cached_property
31+
def _readiness_endpoint(self) -> str:
32+
parsed = urlparse(self.config.openmetrics_endpoint)
33+
base = f'{parsed.scheme}://{parsed.netloc}'
34+
return urljoin(base, DEFAULT_READY_PATH)
35+
36+
def _check_n8n_readiness(self) -> None:
37+
endpoint = self._readiness_endpoint
38+
tags = list(self.config.tags or ())
39+
40+
try:
41+
response = self.http.get(endpoint)
42+
except RequestException as e:
43+
self.log.warning("Could not reach n8n readiness endpoint %s: %s", endpoint, e)
44+
self.gauge('readiness.check', 0, tags=tags + ['status_code:none'])
45+
return
46+
47+
is_ready = 200 <= response.status_code < 300
48+
self.gauge(
49+
'readiness.check',
50+
1 if is_ready else 0,
51+
tags=tags + [f'status_code:{response.status_code}'],
52+
)
53+
54+
def check(self, instance: dict[str, Any]) -> None:
5955
self._check_n8n_readiness()
56+
super().check(instance)

n8n/datadog_checks/n8n/data/conf.yaml.example

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ instances:
1818
## https://docs.n8n.io/hosting/logging-monitoring/monitoring/
1919
## https://docs.n8n.io/hosting/configuration/environment-variables/endpoints/
2020
#
21-
- openmetrics_endpoint: http://localhost:5678
21+
- openmetrics_endpoint: http://localhost:5678/metrics
2222

2323
## @param raw_metric_prefix - string - optional - default: n8n_
2424
## The prefix prepended to all metrics from n8n.

0 commit comments

Comments
 (0)