-
Notifications
You must be signed in to change notification settings - Fork 74
feat: enable OTEL Winston integration for trace ID correlation (PE-8803) #568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Enable automatic trace ID injection into Winston logs to correlate log entries with OTEL traces, allowing operators to trace requests through all log entries. Changes: - Enable WinstonInstrumentation in tracing.ts with disableLogSending=true - Update startChildSpan() to auto-detect active span when no parent provided - Wrap data handlers (createRawDataHandler, createDataHandler) with context.with() to make spans active - Wrap chunk handlers (createChunkOffsetHandler, createChunkOffsetDataHandler, createChunkPostHandler) with context.with() - Add tracing to ArNS middleware with resolution timing and attributes - Update startup scripts to import tracing.ts before app.ts so Winston instrumentation patches the logger before it's created Log entries now include trace_id, span_id, and trace_flags fields when within an active span context, enabling correlation with OTEL traces. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
📝 WalkthroughWalkthroughAdds OpenTelemetry tracing and Winston log correlation: tracing initialized at startup, Winston instrumentation enabled, and request flows (ArNS, data, chunk) wrapped in context-bound spans that record attributes, errors, and ensure proper span lifecycle. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant App as Application (HTTP handlers)
participant Tracing as OTEL SDK
participant NameRes as ArNS / NameResolver
participant Store as Storage / Data Layer
participant Logger as Winston
Client->>App: HTTP request
App->>Tracing: start root span
Note right of Tracing: context made active
App->>NameRes: resolve name (if applicable) [within active context]
NameRes-->>App: resolution result / metadata
App->>Store: fetch chunk/data (within active context)
Store-->>App: data + metadata
App->>Logger: logs (trace_id/span_id injected)
App-->>Client: HTTP response
App->>Tracing: span.end()
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Repository UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (2)src/**/*.{ts,tsx,js,jsx}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/**/*.{ts,tsx}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/middleware/arns.ts (1)
130-270: Well-structured tracing integration.The
context.with()wrapper correctly makes the span active for the duration of resolution, enabling downstream code (includingdataHandler) to inherit the trace context. The try/catch/finally pattern ensures:
- Exceptions are recorded on the span with proper ERROR status
- The span is always ended regardless of control flow path
- The original error is re-thrown to preserve existing error handling behavior
One optional enhancement: consider setting
http.status_codeas a span attribute for non-blocked success/failure cases (e.g., 402, 404, redirect) for improved observability consistency.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
docker-entrypoint.sh(1 hunks)package.json(1 hunks)scripts/service(1 hunks)src/middleware/arns.ts(2 hunks)src/routes/chunk/handlers.ts(4 hunks)src/routes/data/handlers.ts(5 hunks)src/tracing.ts(3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.{ts,tsx,js,jsx}: Support dry-run mode for upload testing by acceptingARWEAVE_POST_DRY_RUNenvironment variable to simulate transaction and chunk uploads without posting to Arweave network on ports 3000 and 4000
Runyarn lint:checkafter making changes and useyarn lint:fixto automatically fix linting issues
Check for code duplication usingyarn duplicate:checkand generate HTML report withyarn duplicate:report; useyarn duplicate:cifor CI duplicate checks
Check for circular dependencies usingyarn deps:check, generate dependency graph withyarn deps:graph, find orphan modules withyarn deps:orphans, find leaf modules withyarn deps:leaves, show dependency summary withyarn deps:summary, and useyarn deps:cifor CI checks
Files:
src/tracing.tssrc/middleware/arns.tssrc/routes/chunk/handlers.tssrc/routes/data/handlers.ts
src/**/*.{ts,tsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Add or improve TSDoc comments in code when modifying files to enhance documentation
Files:
src/tracing.tssrc/middleware/arns.tssrc/routes/chunk/handlers.tssrc/routes/data/handlers.ts
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Control services using `yarn service:start`, `yarn service:stop`, and `yarn service:logs` commands; service logs are stored in `logs/service.log` (JSONL format) and OTEL spans in `logs/otel-spans.jsonl`
📚 Learning: 2025-11-24T22:20:55.385Z
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Install dependencies with `yarn install`, start development server with `yarn start`, use watch mode with `yarn watch`, build for production with `yarn build`, and run production build with `yarn start:prod`
Applied to files:
scripts/servicepackage.json
📚 Learning: 2025-11-24T22:20:55.385Z
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Control services using `yarn service:start`, `yarn service:stop`, and `yarn service:logs` commands; service logs are stored in `logs/service.log` (JSONL format) and OTEL spans in `logs/otel-spans.jsonl`
Applied to files:
scripts/service
📚 Learning: 2025-11-24T22:20:55.385Z
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Applies to src/**/*.test.{ts,tsx} test/**/*.{ts,tsx} : Run all tests with `yarn test`; run individual test files with `yarn test:file src/path/to/test.ts`; run with coverage using `yarn test:file:coverage src/path/to/test.ts` or `yarn test:coverage`; run e2e tests with `yarn test:e2e`; mock functions use `mock.fn()` and reset with `mock.restoreAll()` in afterEach; database schemas come from `test/*.sql` files
Applied to files:
package.json
📚 Learning: 2025-11-24T22:20:55.385Z
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Applies to src/**/*.{ts,tsx,js,jsx} : Check for circular dependencies using `yarn deps:check`, generate dependency graph with `yarn deps:graph`, find orphan modules with `yarn deps:orphans`, find leaf modules with `yarn deps:leaves`, show dependency summary with `yarn deps:summary`, and use `yarn deps:ci` for CI checks
Applied to files:
package.json
📚 Learning: 2025-11-24T22:20:55.385Z
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Applies to migrations/**/*.sql : Create new database migrations with `yarn db:migrate create --folder migrations --name <schema>.description.sql`; apply migrations with `yarn db:migrate up`; revert with `yarn db:migrate down --step N` or by specific name; check migration status in `migrations` table in `data/sqlite/core.db`; after applying migrations, update test schemas with `./test/dump-test-schemas`
Applied to files:
package.json
📚 Learning: 2025-11-24T22:20:55.385Z
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Applies to src/**/*.{ts,tsx,js,jsx} : Run `yarn lint:check` after making changes and use `yarn lint:fix` to automatically fix linting issues
Applied to files:
package.json
📚 Learning: 2025-11-24T22:20:55.385Z
Learnt from: CR
Repo: ar-io/ar-io-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T22:20:55.385Z
Learning: Applies to src/**/*[Rr]ate*[Ll]imit*.{ts,tsx} src/**/*402*.{ts,tsx} : When modifying rate limiting functionality, update `docs/x402-and-rate-limiting.md`: add/remove rate limited endpoints in 'Rate Limited Endpoints' section, update environment variables in both the guide and `docs/envs.md`, update payment flow in 'How They Work Together' section, and update configuration options in reference tables
Applied to files:
src/routes/chunk/handlers.tssrc/routes/data/handlers.ts
📚 Learning: 2025-02-11T14:27:05.076Z
Learnt from: hlolli
Repo: ar-io/ar-io-node PR: 315
File: src/arweave/composite-client.ts:1186-1186
Timestamp: 2025-02-11T14:27:05.076Z
Learning: In ArweaveCompositeClient's broadcastChunk method, secondary chunk posting is intentionally implemented as fire-and-forget (not awaited) to avoid delaying the primary chunk posting response.
Applied to files:
src/routes/chunk/handlers.ts
📚 Learning: 2025-07-30T18:32:44.622Z
Learnt from: djwhitt
Repo: ar-io/ar-io-node PR: 457
File: src/routes/data/handlers.ts:429-433
Timestamp: 2025-07-30T18:32:44.622Z
Learning: In src/routes/data/handlers.ts, the ID validation logic uses AND operators intentionally to only reject IDs that match the regex pattern /^[a-zA-Z0-9-_]{43}$/ but fail base64url round-trip validation. IDs that are null or don't match the regex pattern are allowed to proceed further in the pipeline rather than being immediately rejected with "Invalid ID", indicating different error handling is desired for different types of invalid IDs.
Applied to files:
src/routes/data/handlers.ts
🧬 Code graph analysis (2)
src/middleware/arns.ts (3)
src/tracing.ts (4)
tracer(118-120)context(123-123)trace(123-123)SpanStatusCode(123-123)src/routes/data/handlers.ts (2)
sendNotFound(656-662)sendPaymentRequired(664-670)src/constants.ts (1)
headerNames(27-66)
src/routes/chunk/handlers.ts (8)
src/tracing.ts (2)
context(123-123)trace(123-123)src/data/ar-io-data-source.ts (1)
request(100-137)src/routes/data/handlers.ts (1)
getRequestAttributes(424-465)src/data/chunk-retrieval-service.ts (2)
ChunkNotFoundError(101-109)hasTxId(81-85)src/lib/encoding.ts (1)
toB64Url(29-31)src/routes/chunk/response-utils.ts (1)
setCommonChunkHeaders(38-79)src/lib/merkle-path-parser.ts (1)
parseDataPath(485-549)src/constants.ts (1)
headerNames(27-66)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: test (macos-latest)
- GitHub Check: test (ubuntu-latest)
🔇 Additional comments (14)
docker-entrypoint.sh (1)
19-21: LGTM!The comment clearly explains the initialization order requirement, and using
--importensurestracing.jsexecutes beforeapp.js, allowing Winston instrumentation to patch the logger before it's created.scripts/service (1)
27-29: LGTM!The import order is correct:
register.jsenables ts-node for TypeScript execution, thentracing.tspatches Winston before the logger is created inapp.ts. The comment clearly documents this dependency.package.json (1)
146-147: LGTM!Both development and production start scripts correctly import tracing before the application, ensuring consistent behavior across all environments. The ordering aligns with
docker-entrypoint.shandscripts/service.src/tracing.ts (2)
93-96: LGTM!The WinstonInstrumentation configuration is correct:
disableLogSending: trueprevents logs from being sent to the OTEL pipeline (as specified in PR requirements)disableLogCorrelation: falseenables automatic injection oftrace_id,span_id, andtrace_flagsinto Winston log entries
125-141: LGTM!The updated
startChildSpannow auto-detects the parent span from the active context whenparentSpanis not provided, enabling gradual migration from explicit parent passing to context-based propagation. The nullish coalescing on line 132 correctly handles bothundefinedand omitted arguments.src/middleware/arns.ts (2)
18-18: LGTM!Correct imports from the tracing module to enable span creation and context propagation.
120-128: LGTM!Good choice of initial span attributes capturing HTTP context and the ArNS subdomain. This provides useful context for debugging and trace analysis.
src/routes/chunk/handlers.ts (4)
23-23: LGTM!The import of
contextandtracefrom the tracing module is appropriate for enabling OTEL context propagation in the handlers.
65-266: LGTM!The context propagation pattern is correctly implemented:
- Span is created before the
context.with()wrapper- All handler logic runs within the active span context
span.end()is properly placed in thefinallyblock ensuring cleanup on all code paths- Error handling sets appropriate span attributes and records exceptions
This enables Winston logs emitted within the handler to automatically receive trace context (trace_id, span_id).
295-516: LGTM!The context propagation follows the same correct pattern as
createChunkOffsetHandler. The merkle path parsing, header setting, and conditional request handling all execute within the active span context, ensuring trace correlation for any logs emitted during these operations.
599-663: LGTM!The chunk post handler correctly implements context propagation. The broadcast operation and its success/failure tracking all occur within the active span context, enabling trace correlation for logs produced during chunk broadcasting.
src/routes/data/handlers.ts (3)
15-15: LGTM!Import of
contextandtracealigns with the tracing pattern used in the handlers.
702-929: LGTM on context propagation pattern.The handler correctly wraps all logic within
context.with(), ensuring trace context is active for Winston logs. The comprehensive span attributes (data.size, data.hash, data.cached, retrieval durations) provide excellent observability.
1138-1472: LGTM on context propagation pattern.The data handler correctly implements context propagation with comprehensive span attributes for manifest resolution, data retrieval, and caching status. The
try/catch/finallystructure ensures proper span lifecycle management.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #568 +/- ##
===========================================
+ Coverage 75.89% 75.92% +0.02%
===========================================
Files 96 96
Lines 30885 30888 +3
Branches 2161 2163 +2
===========================================
+ Hits 23441 23451 +10
+ Misses 7423 7416 -7
Partials 21 21 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add parentSpan: span parameter to handleRangeRequest calls in createRawDataHandler and createDataItemHandler to ensure proper trace hierarchy for range requests. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
The test "should detect .cdb files removed at runtime" was flaky on macOS because it only waited 500ms for the file watcher to detect the removal. Increased to 1500ms to match other watcher tests. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Summary
Changes
Core tracing (
src/tracing.ts):WinstonInstrumentationwithdisableLogSending: true(inject trace context without sending logs to OTEL pipeline)startChildSpan()to auto-detect active span when no parent is providedRequest handlers:
createRawDataHandlerandcreateDataHandlerwithcontext.with()to make spans activecreateChunkOffsetHandler,createChunkOffsetDataHandler, andcreateChunkPostHandlerwithcontext.with()ArNS middleware (
src/middleware/arns.ts):Startup scripts:
scripts/service,package.json, anddocker-entrypoint.shto import tracing module before app so Winston instrumentation patches the logger before it's createdResult
Log entries now include
trace_id,span_id, andtrace_flagsfields when within an active span context:{ "class": "CompositeDataAttributesSource", "id": "lyNVBNs79dPDiVC7IwxWRvjVmIAocCy4i3IKa_RHrnQ", "level": "debug", "message": "Fetching data attributes from source", "span_id": "5707693643686e4b", "timestamp": "2025-12-19T23:15:47.890Z", "trace_flags": "01", "trace_id": "dd6865926001cffc90211e5a405ccf60" }Test plan
trace_id,span_id,trace_flagsappear in log entriestrace_idappears across all logs for a single request (29 entries shared same trace ID)otel-spans.jsonlCloses #567
🤖 Generated with Claude Code