feat: enable OTEL Winston integration for trace ID correlation (PE-8803) #568

djwhitt · 2025-12-19T23:19:18Z

Summary

Enable automatic trace ID injection into Winston logs to correlate log entries with OTEL traces
Wrap request handlers with active OTEL context so child spans and logs share the same trace ID
Add tracing to ArNS middleware for unified traces from name resolution through data delivery

Changes

Core tracing (src/tracing.ts):

Enable WinstonInstrumentation with disableLogSending: true (inject trace context without sending logs to OTEL pipeline)
Update startChildSpan() to auto-detect active span when no parent is provided

Request handlers:

Wrap createRawDataHandler and createDataHandler with context.with() to make spans active
Wrap createChunkOffsetHandler, createChunkOffsetDataHandler, and createChunkPostHandler with context.with()

ArNS middleware (src/middleware/arns.ts):

Add tracing span for ArNS resolution with attributes: subdomain, resolved_id, ttl, resolution_duration_ms
Wrap resolution and data handler call with active context

Startup scripts:

Update scripts/service, package.json, and docker-entrypoint.sh to import tracing module before app so Winston instrumentation patches the logger before it's created

Result

Log entries now include trace_id, span_id, and trace_flags fields when within an active span context:

{
  "class": "CompositeDataAttributesSource",
  "id": "lyNVBNs79dPDiVC7IwxWRvjVmIAocCy4i3IKa_RHrnQ",
  "level": "debug",
  "message": "Fetching data attributes from source",
  "span_id": "5707693643686e4b",
  "timestamp": "2025-12-19T23:15:47.890Z",
  "trace_flags": "01",
  "trace_id": "dd6865926001cffc90211e5a405ccf60"
}

Test plan

Start service and make data requests
Verify trace_id, span_id, trace_flags appear in log entries
Verify the same trace_id appears across all logs for a single request (29 entries shared same trace ID)
Verify trace ID matches entries in otel-spans.jsonl
Test ArNS requests to verify traces span from resolution through data delivery
Verify Docker deployment works with the entrypoint change

Closes #567

🤖 Generated with Claude Code

Enable automatic trace ID injection into Winston logs to correlate log entries with OTEL traces, allowing operators to trace requests through all log entries. Changes: - Enable WinstonInstrumentation in tracing.ts with disableLogSending=true - Update startChildSpan() to auto-detect active span when no parent provided - Wrap data handlers (createRawDataHandler, createDataHandler) with context.with() to make spans active - Wrap chunk handlers (createChunkOffsetHandler, createChunkOffsetDataHandler, createChunkPostHandler) with context.with() - Add tracing to ArNS middleware with resolution timing and attributes - Update startup scripts to import tracing.ts before app.ts so Winston instrumentation patches the logger before it's created Log entries now include trace_id, span_id, and trace_flags fields when within an active span context, enabling correlation with OTEL traces. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

coderabbitai · 2025-12-19T23:22:30Z

📝 Walkthrough

Walkthrough

Adds OpenTelemetry tracing and Winston log correlation: tracing initialized at startup, Winston instrumentation enabled, and request flows (ArNS, data, chunk) wrapped in context-bound spans that record attributes, errors, and ensure proper span lifecycle.

Changes

Cohort / File(s)	Summary
Startup / Launch scripts `\`docker-entrypoint.sh``,` `package.json``,` `scripts/service``	Add Node `--import` for tracing (`--import ./src/tracing.ts` / `--import ./dist/tracing.js`) so tracing initializes before app code; update start/start:prod/service commands and add comment about tracing-before-logger ordering.
Tracing configuration `\`src/tracing.ts``	Enable `WinstonInstrumentation` (with `disableLogSending: true`), preserve log correlation, and update `startChildSpan()` to derive parent from active context when no parentSpan is supplied.
ArNS middleware `\`src/middleware/arns.ts``	Wrap ArNS resolution in an OTEL span (`ArNSMiddleware.resolve`) via `context.with`, record resolution timing/metadata and attributes, handle blocked subdomains/limits inside span, record errors on span, and ensure span.end() in finally.
Chunk handlers `\`src/routes/chunk/handlers.ts``	Wrap GET/HEAD/POST flows with `context.with(trace.setSpan(...))` to propagate spans across async boundaries; add span attributes for retrieval/merkle parsing; move payment/rate-limit checks earlier; compute/set ETag and chunk-related headers; support If-None-Match/HEAD handling and ensure span lifecycle.
Data handlers `\`src/routes/data/handlers.ts``	Execute handler logic inside `context.with` active span: ID validation, blocklist checks, attribute retrieval, range/manifest handling, streaming, header/ETag management; set data-related span attributes, centralize error handling, and finalize spans.
Tests `\`src/discovery/cdb64-root-tx-index.test.ts``	Increase filesystem-watcher wait time from 500ms to 1500ms in a runtime-watch test.
Changelog `\`CHANGELOG.md``	Document OTEL + Winston integration, trace context injection into logs, and summary of handlers/middleware wrapped with active OTEL context.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant App as Application (HTTP handlers)
  participant Tracing as OTEL SDK
  participant NameRes as ArNS / NameResolver
  participant Store as Storage / Data Layer
  participant Logger as Winston

  Client->>App: HTTP request
  App->>Tracing: start root span
  Note right of Tracing: context made active
  App->>NameRes: resolve name (if applicable) [within active context]
  NameRes-->>App: resolution result / metadata
  App->>Store: fetch chunk/data (within active context)
  Store-->>App: data + metadata
  App->>Logger: logs (trace_id/span_id injected)
  App-->>Client: HTTP response
  App->>Tracing: span.end()

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Inspect async context propagation in src/routes/data/handlers.ts and src/routes/chunk/handlers.ts to ensure context.with covers awaited code and all early returns/errors call span.end().
Verify span attribute consistency and naming across ArNS, data, and chunk handlers.
Confirm startup import order ensures tracing/Winston instrumentation initialize before any logger creation.

Possibly related PRs

refactor(arns): pass ArNS data via request context instead of headers #469 — Modifies ArNS resolution and sets req.dataId/req.manifestPath; overlaps with ArNS middleware changes here.
feat: add comprehensive OTEL tracing for chunk retrieval pipeline (PE-8446) #471 — Adds OTEL tracing to chunk/data handlers and tracing initialization; closely related to this PR's handler instrumentation.
fix: apply chunk rate limits before expensive txResult lookup (PE-8685) #524 — Reorders payment/rate-limit checks in chunk handlers; intersects with moved checks and tracing-instrumented flow.

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: enable OTEL Winston integration for trace ID correlation (PE-8803)' clearly describes the main change: enabling Winston instrumentation for automatic trace ID injection into logs.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, detailing the summary, changes across multiple files, expected results, and test plan with specific examples.
Linked Issues check	✅ Passed	The PR successfully implements all must-have requirements from issue #567: enables WinstonInstrumentation with disableLogSending: true, wraps request handlers with context.with(), updates startChildSpan() for auto-detection, and adds tracing to ArNS middleware.
Out of Scope Changes check	✅ Passed	All changes are aligned with the requirements in issue #567. The startup script updates and test timing adjustment are necessary supporting changes for enabling Winston instrumentation before app initialization.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch PE-8803-otel-winston-trace-correlation

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 917dee7 and d51862b.

📒 Files selected for processing (1)

src/discovery/cdb64-root-tx-index.test.ts (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

src/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{ts,tsx,js,jsx}: Support dry-run mode for upload testing by accepting ARWEAVE_POST_DRY_RUN environment variable to simulate transaction and chunk uploads without posting to Arweave network on ports 3000 and 4000
Run yarn lint:check after making changes and use yarn lint:fix to automatically fix linting issues
Check for code duplication using yarn duplicate:check and generate HTML report with yarn duplicate:report; use yarn duplicate:ci for CI duplicate checks
Check for circular dependencies using yarn deps:check, generate dependency graph with yarn deps:graph, find orphan modules with yarn deps:orphans, find leaf modules with yarn deps:leaves, show dependency summary with yarn deps:summary, and use yarn deps:ci for CI checks

Files:

src/discovery/cdb64-root-tx-index.test.ts

src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Add or improve TSDoc comments in code when modifying files to enhance documentation

Files:

src/discovery/cdb64-root-tx-index.test.ts

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: test (macos-latest)
GitHub Check: test (ubuntu-latest)

🔇 Additional comments (1)

src/discovery/cdb64-root-tx-index.test.ts (1)

449-449: LGTM! Test timing adjustment improves reliability.

The increased wait time aligns with the watcher's 1000ms stability threshold (as noted in the comment at line 412) and is now consistent with other watcher tests in this file (lines 412, 486, 534, 560). This change should improve test reliability, especially with the added tracing instrumentation from this PR.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/middleware/arns.ts (1)

130-270: Well-structured tracing integration.

The context.with() wrapper correctly makes the span active for the duration of resolution, enabling downstream code (including dataHandler) to inherit the trace context. The try/catch/finally pattern ensures:

Exceptions are recorded on the span with proper ERROR status

The span is always ended regardless of control flow path

The original error is re-thrown to preserve existing error handling behavior

One optional enhancement: consider setting http.status_code as a span attribute for non-blocked success/failure cases (e.g., 402, 404, redirect) for improved observability consistency.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e9697f and cccd781.

📒 Files selected for processing (7)

docker-entrypoint.sh (1 hunks)
package.json (1 hunks)
scripts/service (1 hunks)
src/middleware/arns.ts (2 hunks)
src/routes/chunk/handlers.ts (4 hunks)
src/routes/data/handlers.ts (5 hunks)
src/tracing.ts (3 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

src/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{ts,tsx,js,jsx}: Support dry-run mode for upload testing by accepting ARWEAVE_POST_DRY_RUN environment variable to simulate transaction and chunk uploads without posting to Arweave network on ports 3000 and 4000
Run yarn lint:check after making changes and use yarn lint:fix to automatically fix linting issues
Check for code duplication using yarn duplicate:check and generate HTML report with yarn duplicate:report; use yarn duplicate:ci for CI duplicate checks
Check for circular dependencies using yarn deps:check, generate dependency graph with yarn deps:graph, find orphan modules with yarn deps:orphans, find leaf modules with yarn deps:leaves, show dependency summary with yarn deps:summary, and use yarn deps:ci for CI checks

Files:

src/tracing.ts
src/middleware/arns.ts
src/routes/chunk/handlers.ts
src/routes/data/handlers.ts

src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Add or improve TSDoc comments in code when modifying files to enhance documentation

Files:

src/tracing.ts
src/middleware/arns.ts
src/routes/chunk/handlers.ts
src/routes/data/handlers.ts

🧠 Learnings (10)

📓 Common learnings

Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Control services using `yarn service:start`, `yarn service:stop`, and `yarn service:logs` commands; service logs are stored in `logs/service.log` (JSONL format) and OTEL spans in `logs/otel-spans.jsonl`