Skip to content

Commit bdf939c

Browse files
authored
Merge pull request #320 from NillionNetwork/feat/otel
feat: OpenTelemetry logs, metrics, and tracing
2 parents 7a086fc + 5d22b10 commit bdf939c

File tree

16 files changed

+4265
-1632
lines changed

16 files changed

+4265
-1632
lines changed

docs/admin-guide.md

Lines changed: 72 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -18,18 +18,18 @@ This section provides task-oriented instructions for node administrators.
1818

1919
The following environment variables are require:
2020

21-
| Variable | Description | Example |
22-
|--------------------------|-----------------------------------------------------------|-------------------------------------|
23-
| APP_DB_NAME_BASE | Database name prefix | nildb_data |
24-
| APP_DB_URI | MongoDB connection string | mongodb://node-xxxx-db:27017 |
25-
| APP_ENABLED_FEATURES | Enable features | openapi-spec,metrics,migrations |
26-
| APP_LOG_LEVEL | Logging verbosity | debug |
27-
| APP_METRICS_PORT | Prometheus metrics port | 9091 |
28-
| APP_NILAUTH_BASE_URL | The nilauth service url for subscriptions and revocations | http://127.0.0.1:30921 |
29-
| APP_NILAUTH_PUBLIC_KEY | The nilauth service's secp256k1 public key | [hex encoded secp256k1 public key] |
30-
| APP_NODE_PUBLIC_ENDPOINT | Public URL of node | https://nildb-xxxx.domain.com |
31-
| APP_NODE_SECRET_KEY | Node's private key | [hex encoded secp256k1 private key] |
32-
| APP_PORT | API service port | 8080 |
21+
| Variable | Description | Example |
22+
|--------------------------|-----------------------------------------------------------|--------------------------------------|
23+
| APP_DB_NAME_BASE | Database name prefix | nildb_data |
24+
| APP_DB_URI | MongoDB connection string | mongodb://node-xxxx-db:27017 |
25+
| APP_ENABLED_FEATURES | Enable features | openapi-spec,metrics,migrations,otel |
26+
| APP_LOG_LEVEL | Logging verbosity | debug |
27+
| APP_METRICS_PORT | Prometheus metrics port | 9091 |
28+
| APP_NILAUTH_BASE_URL | The nilauth service url for subscriptions and revocations | http://127.0.0.1:30921 |
29+
| APP_NILAUTH_PUBLIC_KEY | The nilauth service's secp256k1 public key | [hex encoded secp256k1 public key] |
30+
| APP_NODE_PUBLIC_ENDPOINT | Public URL of node | https://nildb-xxxx.domain.com |
31+
| APP_NODE_SECRET_KEY | Node's private key | [hex encoded secp256k1 private key] |
32+
| APP_PORT | API service port | 8080 |
3333

3434
### Rate Limiting
3535

@@ -41,6 +41,52 @@ The following variables control the IP-based rate limiting feature.
4141
| APP_RATE_LIMIT_WINDOW_SECONDS | The duration of the time window in seconds. | `60` |
4242
| APP_RATE_LIMIT_MAX_REQUESTS | Max requests per IP within the time window. | `60` |
4343

44+
45+
### OpenTelemetry
46+
47+
When the `otel` feature is enabled in `APP_ENABLED_FEATURES`, the following variables configure OpenTelemetry instrumentation:
48+
49+
| Variable | Description | Default | Required |
50+
|---------------------------------|-----------------------------------------|------------------|----------|
51+
| OTEL_ENDPOINT | OTLP endpoint URL | http://localhost | No |
52+
| OTEL_SERVICE_NAME | Service name for telemetry | nildb | No |
53+
| OTEL_TEAM_NAME | Team responsible for the service | nildb | No |
54+
| OTEL_DEPLOYMENT_ENV | Deployment environment | local | No |
55+
| OTEL_METRICS_EXPORT_INTERVAL_MS | Metrics export interval in milliseconds | 60000 | No |
56+
| OTEL_RESOURCE_ATTRIBUTES | Additional resource attributes | (not set) | No |
57+
| OTEL_SDK_DISABLED | Disable OpenTelemetry SDK | (not set) | No |
58+
59+
**Setting Custom Resource Attributes:**
60+
61+
You can set or override any OpenTelemetry resource attributes using `OTEL_RESOURCE_ATTRIBUTES`:
62+
63+
```bash
64+
# Set single attribute
65+
OTEL_RESOURCE_ATTRIBUTES=service.instance.id=nildb-r5nw
66+
67+
# Set multiple attributes (comma-separated)
68+
OTEL_RESOURCE_ATTRIBUTES=service.instance.id=nildb-r5nw,custom.key=value
69+
```
70+
71+
Values set via `OTEL_RESOURCE_ATTRIBUTES` take precedence over programmatically set values. This is useful for setting deployment-specific identifiers like `service.instance.id` without modifying code.
72+
73+
**Feature Flag Behavior:**
74+
75+
- **`metrics` only**: Metrics are served on `:9091/metrics` endpoint using OpenTelemetry PrometheusExporter. No traces or logs sent to OTLP.
76+
- **`otel` only**: Metrics, traces, and logs are pushed to OTLP endpoint. No `/metrics` endpoint is exposed.
77+
78+
> ![NOTE]
79+
> The `metrics` and `otel` feature flags are **mutually exclusive**. Enabling both will cause the server to exit with an error. Choose one observability mode.
80+
81+
**Disabling OpenTelemetry SDK:**
82+
83+
To disable OpenTelemetry SDK and prevent telemetry emission while keeping the `otel` feature flag enabled, set the following environment variable:
84+
```bash
85+
OTEL_SDK_DISABLED=true
86+
```
87+
88+
This is useful for local development where you want to use Pino stdout logging instead of sending telemetry to an OTLP endpoint.
89+
4490
## Start the node
4591

4692
### Local Development
@@ -53,15 +99,22 @@ docker compose -f local/docker-compose.yaml up -d
5399
```
54100

55101
This stack includes:
56-
- **nilDB**: The main API service (port 40080)
102+
- **nilDB**: The main API service (port 40080, metrics port 40091)
57103
- **MongoDB**: Database backend (port 40017)
58104
- **nilauth**: Authentication service for NUC tokens (port 40921)
59105
- **nilchain**: Local blockchain for testing payments (JSON-RPC port 40648)
60106
- **PostgreSQL**: Database for nilauth (port 40432)
61107
- **token-price-api**: Mock token pricing service (port 40923)
108+
- **otel-collector**: OpenTelemetry collector with debug exporter (OTLP port 40318)
62109

63110
The nilDB API will be available at `http://localhost:40080`.
64111

112+
The local stack is configured with `APP_ENABLED_FEATURES=openapi,otel,migrations`, which means:
113+
- Metrics, traces, and logs are sent to the OTel Collector (visible in `docker compose logs otel-collector`)
114+
- No `/metrics` endpoint is exposed (OTLP push only)
115+
116+
To switch to metrics-only mode (serve metrics on `:40091/metrics`), change `otel` to `metrics` in the docker-compose.yaml file.
117+
65118
### Production Deployment
66119

67120
A nilDB node consists of a MongoDB instance and a RESTful API service. Below is a basic Docker Compose configuration:
@@ -105,10 +158,14 @@ The following endpoints provide operational information:
105158

106159
- `GET /health` - Service health check
107160
- `GET /about` - Node configuration
108-
- `GET :9091/metrics` - Prometheus metrics (internal access only)
161+
- `GET :9091/metrics` - Prometheus metrics (internal access only, available when `metrics` feature flag is enabled without `otel`)
109162

110163
> ![NOTE]
111-
> `/metrics` shouldn't be exposed publicly.
164+
> The `/metrics` endpoint behavior depends on feature flags:
165+
> - **`metrics` only**: Serves metrics at `:9091/metrics` using OpenTelemetry PrometheusExporter
166+
> - **`otel` enabled**: No `/metrics` endpoint; all telemetry pushed to OTLP collector
167+
>
168+
> The `/metrics` endpoint should never be exposed publicly - use firewall rules or network policies to restrict access to internal monitoring systems only.
112169
113170
## Logging
114171

justfile

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,14 @@ install:
1919
# --- Quality
2020
# ------------------
2121

22+
# Build workspace dependencies (required for type checking and tests)
23+
build-deps:
24+
pnpm --filter @nillion/nildb-types build
25+
pnpm --filter @nillion/nildb-shared build
26+
pnpm --filter @nillion/nildb-client build
27+
2228
# Check for formatting, lint, and type errors
23-
check:
29+
check: build-deps
2430
pnpm exec tsc -b && pnpm exec biome ci && pnpm exec tsc -b --noEmit
2531

2632
# Format, fix, and type check all files
@@ -40,27 +46,27 @@ dev:
4046
pnpm --filter @nillion/nildb dev
4147

4248
# Build nildb
43-
build:
49+
build: build-deps
4450
pnpm --filter @nillion/nildb build
4551

4652
# ------------------
4753
# --- Testing
4854
# ------------------
4955

5056
# Run all tests (unit & integration)
51-
test:
57+
test: build-deps
5258
pnpm exec vitest run
5359

5460
# Run unit tests
55-
test-unit:
61+
test-unit: build-deps
5662
pnpm exec vitest run --project=unit
5763

5864
# Run integration tests
59-
test-integration:
65+
test-integration: build-deps
6066
pnpm exec vitest run --project=integration
6167

6268
# Run tests with coverage
63-
test-coverage:
69+
test-coverage: build-deps
6470
pnpm exec vitest run --coverage
6571

6672
# ------------------

local/docker-compose.yaml

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,19 +5,26 @@ services:
55
dockerfile: packages/api/Dockerfile
66
ports:
77
- "40080:8080"
8+
- "40091:9091"
89
depends_on:
910
- mongodb
11+
- otel-collector
1012
environment:
1113
- APP_DB_NAME_BASE=nildb
1214
- APP_DB_URI=mongodb://mongodb:27017
13-
- APP_ENABLED_FEATURES=openapi,metrics,migrations
15+
- APP_ENABLED_FEATURES=openapi,otel,migrations
1416
- APP_LOG_LEVEL=debug
1517
- APP_METRICS_PORT=9091
1618
- APP_NILAUTH_BASE_URL=http://nilauth:8080
1719
- APP_NILAUTH_PUBLIC_KEY=03520e70bd97a5fa6d70c614d50ee47bf445ae0b0941a1d61ddd5afa022b97ab14
1820
- APP_NODE_PUBLIC_ENDPOINT=http://localhost:40080
1921
- APP_NODE_SECRET_KEY=6cab2d10ac21886404eca7cbd40f1777071a243177eae464042885b391412b4e
2022
- APP_PORT=8080
23+
- OTEL_ENDPOINT=http://otel-collector:4318
24+
- OTEL_SERVICE_NAME=nildb
25+
- OTEL_TEAM_NAME=nildb
26+
- OTEL_DEPLOYMENT_ENV=local
27+
- OTEL_METRICS_EXPORT_INTERVAL_MS=10000
2128

2229
mongodb:
2330
image: mongo:latest
@@ -56,3 +63,11 @@ services:
5663
- ./nilchaind/config/genesis.json:/opt/nilchain/config/genesis.json
5764
ports:
5865
- "40648:26648" # JSON RPC
66+
67+
otel-collector:
68+
image: otel/opentelemetry-collector:latest
69+
command: ["--config=/etc/otel-collector-config.yaml"]
70+
volumes:
71+
- ./otel-collector/config.yaml:/etc/otel-collector-config.yaml
72+
ports:
73+
- "40318:4318" # OTLP HTTP receiver

local/otel-collector/config.yaml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
receivers:
2+
otlp:
3+
protocols:
4+
http:
5+
endpoint: 0.0.0.0:4318
6+
7+
processors:
8+
batch:
9+
10+
exporters:
11+
# Debug exporter - logs all telemetry to stdout
12+
debug:
13+
verbosity: detailed
14+
15+
service:
16+
pipelines:
17+
traces:
18+
receivers: [otlp]
19+
processors: [batch]
20+
exporters: [debug]
21+
22+
metrics:
23+
receivers: [otlp]
24+
processors: [batch]
25+
exporters: [debug]
26+
27+
logs:
28+
receivers: [otlp]
29+
processors: [batch]
30+
exporters: [debug]

packages/api/env.example

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,10 @@ APP_NILAUTH_PUBLIC_KEY=03520e70bd97a5fa6d70c614d50ee47bf445ae0b0941a1d61ddd5afa0
88
APP_NODE_SECRET_KEY=d5cf0b58964c516465d228be9330a047cb09bcdc7aaabea6485e1152182967fa
99
APP_NODE_PUBLIC_ENDPOINT=http://localhost:8080
1010
APP_PORT=8080
11+
OTEL_ENDPOINT=http://localhost
12+
OTEL_SERVICE_NAME=nildb
13+
OTEL_TEAM_NAME=platform
14+
OTEL_DEPLOYMENT_ENV=local
15+
OTEL_METRICS_EXPORT_INTERVAL_MS=60000
16+
# OTEL_RESOURCE_ATTRIBUTES=service.instance.id=nildb-foo # Set custom resource attributes (comma-separated)
17+
# OTEL_SDK_DISABLED=true # Set to 'true' to disable OpenTelemetry SDK

packages/api/package.json

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,24 @@
1818
"dependencies": {
1919
"@nillion/nildb-types": "workspace:*",
2020
"@nillion/nildb-shared": "workspace:*",
21-
"mongodb": "^6.20.0"
21+
"mongodb": "^6.20.0",
22+
"@opentelemetry/api": "^1.9.0",
23+
"@opentelemetry/api-logs": "^0.54.0",
24+
"@opentelemetry/auto-instrumentations-node": "^0.51.0",
25+
"@opentelemetry/exporter-logs-otlp-http": "^0.54.0",
26+
"@opentelemetry/exporter-metrics-otlp-http": "^0.54.0",
27+
"@opentelemetry/exporter-prometheus": "^0.54.0",
28+
"@opentelemetry/exporter-trace-otlp-http": "^0.54.0",
29+
"@opentelemetry/host-metrics": "^0.36.0",
30+
"@opentelemetry/instrumentation": "^0.54.0",
31+
"@opentelemetry/instrumentation-http": "^0.54.0",
32+
"@opentelemetry/instrumentation-runtime-node": "^0.7.0",
33+
"@opentelemetry/resources": "^1.28.0",
34+
"@opentelemetry/sdk-logs": "^0.54.0",
35+
"@opentelemetry/sdk-metrics": "^1.28.0",
36+
"@opentelemetry/sdk-node": "^0.54.0",
37+
"@opentelemetry/sdk-trace-node": "^1.28.0",
38+
"@opentelemetry/semantic-conventions": "^1.28.0"
2239
},
2340
"devDependencies": {
2441
"@hono/node-server": "^1.19.5",
@@ -72,7 +89,8 @@
7289
"clean": true,
7390
"minify": false,
7491
"external": [
75-
"mongodb"
92+
"mongodb",
93+
"@opentelemetry/*"
7694
]
7795
}
7896
}

packages/api/src/app.ts

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,15 @@ import { corsMiddleware } from "./middleware/cors.middleware.js";
99
import { limitRequestBodySizeMiddleware } from "./middleware/limit-body.middleware.js";
1010
import { loggerMiddleware } from "./middleware/logger.middleware.js";
1111
import { maintenanceMiddleware } from "./middleware/maintenance.middleware.js";
12+
import { metricsMiddleware } from "./middleware/metrics.middleware.js";
1213
import { rateLimitMiddleware } from "./middleware/rate-limit.middleware.js";
1314
import { buildQueriesRouter } from "./queries/queries.router.js";
1415
import { buildSystemRouter } from "./system/system.router.js";
1516
import { buildUserRouter } from "./users/users.router.js";
1617

1718
export type App = Hono<AppEnv>;
1819

19-
export async function buildApp(
20-
bindings: AppBindings,
21-
): Promise<{ app: App; metrics: Hono | undefined }> {
20+
export async function buildApp(bindings: AppBindings): Promise<{ app: App }> {
2221
const app = new Hono<AppEnv>();
2322
const options: ControllerOptions = { app, bindings };
2423

@@ -35,19 +34,20 @@ export async function buildApp(
3534
return next();
3635
});
3736

37+
metricsMiddleware(options);
3838
corsMiddleware(options);
3939
rateLimitMiddleware(options);
4040
limitRequestBodySizeMiddleware(options);
4141
loggerMiddleware(options);
4242
maintenanceMiddleware(options);
4343

4444
// Setup controllers
45-
const { metrics } = buildSystemRouter(options);
45+
buildSystemRouter(options);
4646
buildBuildersRouter(options);
4747
buildCollectionsRouter(options);
4848
buildQueriesRouter(options);
4949
buildDataRouter(options);
5050
buildUserRouter(options);
5151

52-
return { app, metrics };
52+
return { app };
5353
}

0 commit comments

Comments
 (0)