Skip to content

Conversation

@djwhitt
Copy link
Collaborator

@djwhitt djwhitt commented Dec 19, 2025

Summary

  • Enable automatic trace ID injection into Winston logs to correlate log entries with OTEL traces
  • Wrap request handlers with active OTEL context so child spans and logs share the same trace ID
  • Add tracing to ArNS middleware for unified traces from name resolution through data delivery

Changes

Core tracing (src/tracing.ts):

  • Enable WinstonInstrumentation with disableLogSending: true (inject trace context without sending logs to OTEL pipeline)
  • Update startChildSpan() to auto-detect active span when no parent is provided

Request handlers:

  • Wrap createRawDataHandler and createDataHandler with context.with() to make spans active
  • Wrap createChunkOffsetHandler, createChunkOffsetDataHandler, and createChunkPostHandler with context.with()

ArNS middleware (src/middleware/arns.ts):

  • Add tracing span for ArNS resolution with attributes: subdomain, resolved_id, ttl, resolution_duration_ms
  • Wrap resolution and data handler call with active context

Startup scripts:

  • Update scripts/service, package.json, and docker-entrypoint.sh to import tracing module before app so Winston instrumentation patches the logger before it's created

Result

Log entries now include trace_id, span_id, and trace_flags fields when within an active span context:

{
  "class": "CompositeDataAttributesSource",
  "id": "lyNVBNs79dPDiVC7IwxWRvjVmIAocCy4i3IKa_RHrnQ",
  "level": "debug",
  "message": "Fetching data attributes from source",
  "span_id": "5707693643686e4b",
  "timestamp": "2025-12-19T23:15:47.890Z",
  "trace_flags": "01",
  "trace_id": "dd6865926001cffc90211e5a405ccf60"
}

Test plan

  • Start service and make data requests
  • Verify trace_id, span_id, trace_flags appear in log entries
  • Verify the same trace_id appears across all logs for a single request (29 entries shared same trace ID)
  • Verify trace ID matches entries in otel-spans.jsonl
  • Test ArNS requests to verify traces span from resolution through data delivery
  • Verify Docker deployment works with the entrypoint change

Closes #567

🤖 Generated with Claude Code

Enable automatic trace ID injection into Winston logs to correlate log
entries with OTEL traces, allowing operators to trace requests through
all log entries.

Changes:
- Enable WinstonInstrumentation in tracing.ts with disableLogSending=true
- Update startChildSpan() to auto-detect active span when no parent provided
- Wrap data handlers (createRawDataHandler, createDataHandler) with
  context.with() to make spans active
- Wrap chunk handlers (createChunkOffsetHandler, createChunkOffsetDataHandler,
  createChunkPostHandler) with context.with()
- Add tracing to ArNS middleware with resolution timing and attributes
- Update startup scripts to import tracing.ts before app.ts so Winston
  instrumentation patches the logger before it's created

Log entries now include trace_id, span_id, and trace_flags fields when
within an active span context, enabling correlation with OTEL traces.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 19, 2025

📝 Walkthrough

Walkthrough

Adds OpenTelemetry tracing and Winston log correlation: tracing initialized at startup, Winston instrumentation enabled, and request flows (ArNS, data, chunk) wrapped in context-bound spans that record attributes, errors, and ensure proper span lifecycle.

Changes

Cohort / File(s) Summary
Startup / Launch scripts
\docker-entrypoint.sh`, `package.json`, `scripts/service``
Add Node --import for tracing (--import ./src/tracing.ts / --import ./dist/tracing.js) so tracing initializes before app code; update start/start:prod/service commands and add comment about tracing-before-logger ordering.
Tracing configuration
\src/tracing.ts``
Enable WinstonInstrumentation (with disableLogSending: true), preserve log correlation, and update startChildSpan() to derive parent from active context when no parentSpan is supplied.
ArNS middleware
\src/middleware/arns.ts``
Wrap ArNS resolution in an OTEL span (ArNSMiddleware.resolve) via context.with, record resolution timing/metadata and attributes, handle blocked subdomains/limits inside span, record errors on span, and ensure span.end() in finally.
Chunk handlers
\src/routes/chunk/handlers.ts``
Wrap GET/HEAD/POST flows with context.with(trace.setSpan(...)) to propagate spans across async boundaries; add span attributes for retrieval/merkle parsing; move payment/rate-limit checks earlier; compute/set ETag and chunk-related headers; support If-None-Match/HEAD handling and ensure span lifecycle.
Data handlers
\src/routes/data/handlers.ts``
Execute handler logic inside context.with active span: ID validation, blocklist checks, attribute retrieval, range/manifest handling, streaming, header/ETag management; set data-related span attributes, centralize error handling, and finalize spans.
Tests
\src/discovery/cdb64-root-tx-index.test.ts``
Increase filesystem-watcher wait time from 500ms to 1500ms in a runtime-watch test.
Changelog
\CHANGELOG.md``
Document OTEL + Winston integration, trace context injection into logs, and summary of handlers/middleware wrapped with active OTEL context.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant App as Application (HTTP handlers)
  participant Tracing as OTEL SDK
  participant NameRes as ArNS / NameResolver
  participant Store as Storage / Data Layer
  participant Logger as Winston

  Client->>App: HTTP request
  App->>Tracing: start root span
  Note right of Tracing: context made active
  App->>NameRes: resolve name (if applicable) [within active context]
  NameRes-->>App: resolution result / metadata
  App->>Store: fetch chunk/data (within active context)
  Store-->>App: data + metadata
  App->>Logger: logs (trace_id/span_id injected)
  App-->>Client: HTTP response
  App->>Tracing: span.end()
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Inspect async context propagation in src/routes/data/handlers.ts and src/routes/chunk/handlers.ts to ensure context.with covers awaited code and all early returns/errors call span.end().
  • Verify span attribute consistency and naming across ArNS, data, and chunk handlers.
  • Confirm startup import order ensures tracing/Winston instrumentation initialize before any logger creation.

Possibly related PRs

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: enable OTEL Winston integration for trace ID correlation (PE-8803)' clearly describes the main change: enabling Winston instrumentation for automatic trace ID injection into logs.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing the summary, changes across multiple files, expected results, and test plan with specific examples.
Linked Issues check ✅ Passed The PR successfully implements all must-have requirements from issue #567: enables WinstonInstrumentation with disableLogSending: true, wraps request handlers with context.with(), updates startChildSpan() for auto-detection, and adds tracing to ArNS middleware.
Out of Scope Changes check ✅ Passed All changes are aligned with the requirements in issue #567. The startup script updates and test timing adjustment are necessary supporting changes for enabling Winston instrumentation before app initialization.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch PE-8803-otel-winston-trace-correlation

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 917dee7 and d51862b.

📒 Files selected for processing (1)
  • src/discovery/cdb64-root-tx-index.test.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{ts,tsx,js,jsx}: Support dry-run mode for upload testing by accepting ARWEAVE_POST_DRY_RUN environment variable to simulate transaction and chunk uploads without posting to Arweave network on ports 3000 and 4000
Run yarn lint:check after making changes and use yarn lint:fix to automatically fix linting issues
Check for code duplication using yarn duplicate:check and generate HTML report with yarn duplicate:report; use yarn duplicate:ci for CI duplicate checks
Check for circular dependencies using yarn deps:check, generate dependency graph with yarn deps:graph, find orphan modules with yarn deps:orphans, find leaf modules with yarn deps:leaves, show dependency summary with yarn deps:summary, and use yarn deps:ci for CI checks

Files:

  • src/discovery/cdb64-root-tx-index.test.ts
src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Add or improve TSDoc comments in code when modifying files to enhance documentation

Files:

  • src/discovery/cdb64-root-tx-index.test.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test (macos-latest)
  • GitHub Check: test (ubuntu-latest)
🔇 Additional comments (1)
src/discovery/cdb64-root-tx-index.test.ts (1)

449-449: LGTM! Test timing adjustment improves reliability.

The increased wait time aligns with the watcher's 1000ms stability threshold (as noted in the comment at line 412) and is now consistent with other watcher tests in this file (lines 412, 486, 534, 560). This change should improve test reliability, especially with the added tracing instrumentation from this PR.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/middleware/arns.ts (1)

130-270: Well-structured tracing integration.

The context.with() wrapper correctly makes the span active for the duration of resolution, enabling downstream code (including dataHandler) to inherit the trace context. The try/catch/finally pattern ensures:

  • Exceptions are recorded on the span with proper ERROR status
  • The span is always ended regardless of control flow path
  • The original error is re-thrown to preserve existing error handling behavior

One optional enhancement: consider setting http.status_code as a span attribute for non-blocked success/failure cases (e.g., 402, 404, redirect) for improved observability consistency.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e9697f and cccd781.

📒 Files selected for processing (7)
  • docker-entrypoint.sh (1 hunks)
  • package.json (1 hunks)
  • scripts/service (1 hunks)
  • src/middleware/arns.ts (2 hunks)
  • src/routes/chunk/handlers.ts (4 hunks)
  • src/routes/data/handlers.ts (5 hunks)
  • src/tracing.ts (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.{ts,tsx,js,jsx}: Support dry-run mode for upload testing by accepting ARWEAVE_POST_DRY_RUN environment variable to simulate transaction and chunk uploads without posting to Arweave network on ports 3000 and 4000
Run yarn lint:check after making changes and use yarn lint:fix to automatically fix linting issues
Check for code duplication using yarn duplicate:check and generate HTML report with yarn duplicate:report; use yarn duplicate:ci for CI duplicate checks
Check for circular dependencies using yarn deps:check, generate dependency graph with yarn deps:graph, find orphan modules with yarn deps:orphans, find leaf modules with yarn deps:leaves, show dependency summary with yarn deps:summary, and use yarn deps:ci for CI checks

Files:

  • src/tracing.ts
  • src/middleware/arns.ts
  • src/routes/chunk/handlers.ts
  • src/routes/data/handlers.ts
src/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Add or improve TSDoc comments in code when modifying files to enhance documentation

Files:

  • src/tracing.ts
  • src/middleware/arns.ts
  • src/routes/chunk/handlers.ts
  • src/routes/data/handlers.ts
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Control services using `yarn service:start`, `yarn service:stop`, and `yarn service:logs` commands; service logs are stored in `logs/service.log` (JSONL format) and OTEL spans in `logs/otel-spans.jsonl`
📚 Learning: 2025-11-24T22:20:55.385Z
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Install dependencies with `yarn install`, start development server with `yarn start`, use watch mode with `yarn watch`, build for production with `yarn build`, and run production build with `yarn start:prod`

Applied to files:

  • scripts/service
  • package.json
📚 Learning: 2025-11-24T22:20:55.385Z
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Control services using `yarn service:start`, `yarn service:stop`, and `yarn service:logs` commands; service logs are stored in `logs/service.log` (JSONL format) and OTEL spans in `logs/otel-spans.jsonl`

Applied to files:

  • scripts/service
📚 Learning: 2025-11-24T22:20:55.385Z
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Applies to src/**/*.test.{ts,tsx} test/**/*.{ts,tsx} : Run all tests with `yarn test`; run individual test files with `yarn test:file src/path/to/test.ts`; run with coverage using `yarn test:file:coverage src/path/to/test.ts` or `yarn test:coverage`; run e2e tests with `yarn test:e2e`; mock functions use `mock.fn()` and reset with `mock.restoreAll()` in afterEach; database schemas come from `test/*.sql` files

Applied to files:

  • package.json
📚 Learning: 2025-11-24T22:20:55.385Z
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Applies to src/**/*.{ts,tsx,js,jsx} : Check for circular dependencies using `yarn deps:check`, generate dependency graph with `yarn deps:graph`, find orphan modules with `yarn deps:orphans`, find leaf modules with `yarn deps:leaves`, show dependency summary with `yarn deps:summary`, and use `yarn deps:ci` for CI checks

Applied to files:

  • package.json
📚 Learning: 2025-11-24T22:20:55.385Z
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Applies to migrations/**/*.sql : Create new database migrations with `yarn db:migrate create --folder migrations --name <schema>.description.sql`; apply migrations with `yarn db:migrate up`; revert with `yarn db:migrate down --step N` or by specific name; check migration status in `migrations` table in `data/sqlite/core.db`; after applying migrations, update test schemas with `./test/dump-test-schemas`

Applied to files:

  • package.json
📚 Learning: 2025-11-24T22:20:55.385Z
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Applies to src/**/*.{ts,tsx,js,jsx} : Run `yarn lint:check` after making changes and use `yarn lint:fix` to automatically fix linting issues

Applied to files:

  • package.json
📚 Learning: 2025-11-24T22:20:55.385Z
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Applies to src/**/*[Rr]ate*[Ll]imit*.{ts,tsx} src/**/*402*.{ts,tsx} : When modifying rate limiting functionality, update `docs/x402-and-rate-limiting.md`: add/remove rate limited endpoints in 'Rate Limited Endpoints' section, update environment variables in both the guide and `docs/envs.md`, update payment flow in 'How They Work Together' section, and update configuration options in reference tables

Applied to files:

  • src/routes/chunk/handlers.ts
  • src/routes/data/handlers.ts
📚 Learning: 2025-02-11T14:27:05.076Z
Learnt from: hlolli
Repo: ar-io/ar-io-node PR: 315
File: src/arweave/composite-client.ts:1186-1186
Timestamp: 2025-02-11T14:27:05.076Z
Learning: In ArweaveCompositeClient's broadcastChunk method, secondary chunk posting is intentionally implemented as fire-and-forget (not awaited) to avoid delaying the primary chunk posting response.

Applied to files:

  • src/routes/chunk/handlers.ts
📚 Learning: 2025-07-30T18:32:44.622Z
Learnt from: djwhitt
Repo: ar-io/ar-io-node PR: 457
File: src/routes/data/handlers.ts:429-433
Timestamp: 2025-07-30T18:32:44.622Z
Learning: In src/routes/data/handlers.ts, the ID validation logic uses AND operators intentionally to only reject IDs that match the regex pattern /^[a-zA-Z0-9-_]{43}$/ but fail base64url round-trip validation. IDs that are null or don't match the regex pattern are allowed to proceed further in the pipeline rather than being immediately rejected with "Invalid ID", indicating different error handling is desired for different types of invalid IDs.

Applied to files:

  • src/routes/data/handlers.ts
🧬 Code graph analysis (2)
src/middleware/arns.ts (3)
src/tracing.ts (4)
  • tracer (118-120)
  • context (123-123)
  • trace (123-123)
  • SpanStatusCode (123-123)
src/routes/data/handlers.ts (2)
  • sendNotFound (656-662)
  • sendPaymentRequired (664-670)
src/constants.ts (1)
  • headerNames (27-66)
src/routes/chunk/handlers.ts (8)
src/tracing.ts (2)
  • context (123-123)
  • trace (123-123)
src/data/ar-io-data-source.ts (1)
  • request (100-137)
src/routes/data/handlers.ts (1)
  • getRequestAttributes (424-465)
src/data/chunk-retrieval-service.ts (2)
  • ChunkNotFoundError (101-109)
  • hasTxId (81-85)
src/lib/encoding.ts (1)
  • toB64Url (29-31)
src/routes/chunk/response-utils.ts (1)
  • setCommonChunkHeaders (38-79)
src/lib/merkle-path-parser.ts (1)
  • parseDataPath (485-549)
src/constants.ts (1)
  • headerNames (27-66)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test (macos-latest)
  • GitHub Check: test (ubuntu-latest)
🔇 Additional comments (14)
docker-entrypoint.sh (1)

19-21: LGTM!

The comment clearly explains the initialization order requirement, and using --import ensures tracing.js executes before app.js, allowing Winston instrumentation to patch the logger before it's created.

scripts/service (1)

27-29: LGTM!

The import order is correct: register.js enables ts-node for TypeScript execution, then tracing.ts patches Winston before the logger is created in app.ts. The comment clearly documents this dependency.

package.json (1)

146-147: LGTM!

Both development and production start scripts correctly import tracing before the application, ensuring consistent behavior across all environments. The ordering aligns with docker-entrypoint.sh and scripts/service.

src/tracing.ts (2)

93-96: LGTM!

The WinstonInstrumentation configuration is correct:

  • disableLogSending: true prevents logs from being sent to the OTEL pipeline (as specified in PR requirements)
  • disableLogCorrelation: false enables automatic injection of trace_id, span_id, and trace_flags into Winston log entries

125-141: LGTM!

The updated startChildSpan now auto-detects the parent span from the active context when parentSpan is not provided, enabling gradual migration from explicit parent passing to context-based propagation. The nullish coalescing on line 132 correctly handles both undefined and omitted arguments.

src/middleware/arns.ts (2)

18-18: LGTM!

Correct imports from the tracing module to enable span creation and context propagation.


120-128: LGTM!

Good choice of initial span attributes capturing HTTP context and the ArNS subdomain. This provides useful context for debugging and trace analysis.

src/routes/chunk/handlers.ts (4)

23-23: LGTM!

The import of context and trace from the tracing module is appropriate for enabling OTEL context propagation in the handlers.


65-266: LGTM!

The context propagation pattern is correctly implemented:

  • Span is created before the context.with() wrapper
  • All handler logic runs within the active span context
  • span.end() is properly placed in the finally block ensuring cleanup on all code paths
  • Error handling sets appropriate span attributes and records exceptions

This enables Winston logs emitted within the handler to automatically receive trace context (trace_id, span_id).


295-516: LGTM!

The context propagation follows the same correct pattern as createChunkOffsetHandler. The merkle path parsing, header setting, and conditional request handling all execute within the active span context, ensuring trace correlation for any logs emitted during these operations.


599-663: LGTM!

The chunk post handler correctly implements context propagation. The broadcast operation and its success/failure tracking all occur within the active span context, enabling trace correlation for logs produced during chunk broadcasting.

src/routes/data/handlers.ts (3)

15-15: LGTM!

Import of context and trace aligns with the tracing pattern used in the handlers.


702-929: LGTM on context propagation pattern.

The handler correctly wraps all logic within context.with(), ensuring trace context is active for Winston logs. The comprehensive span attributes (data.size, data.hash, data.cached, retrieval durations) provide excellent observability.


1138-1472: LGTM on context propagation pattern.

The data handler correctly implements context propagation with comprehensive span attributes for manifest resolution, data retrieval, and caching status. The try/catch/finally structure ensures proper span lifecycle management.

@codecov
Copy link

codecov bot commented Dec 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.92%. Comparing base (5e9697f) to head (d51862b).
⚠️ Report is 5 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #568      +/-   ##
===========================================
+ Coverage    75.89%   75.92%   +0.02%     
===========================================
  Files           96       96              
  Lines        30885    30888       +3     
  Branches      2161     2163       +2     
===========================================
+ Hits         23441    23451      +10     
+ Misses        7423     7416       -7     
  Partials        21       21              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

djwhitt and others added 3 commits December 19, 2025 17:51
Add parentSpan: span parameter to handleRangeRequest calls in
createRawDataHandler and createDataItemHandler to ensure proper
trace hierarchy for range requests.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The test "should detect .cdb files removed at runtime" was flaky on
macOS because it only waited 500ms for the file watcher to detect the
removal. Increased to 1500ms to match other watcher tests.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@djwhitt djwhitt merged commit 3bde0d0 into develop Dec 20, 2025
5 of 7 checks passed
@djwhitt djwhitt deleted the PE-8803-otel-winston-trace-correlation branch December 20, 2025 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable OTEL Winston integration for trace ID correlation in logs

2 participants