Improve structured logging across SDK and container#456
Conversation
Scattered start/end log pairs and breadcrumb debug logs made it hard to query operations as single events. This replaces ~40 scattered logs with single wide events emitted at completion, following the canonical log lines pattern from Stripe. Environment characteristics (serviceVersion, instanceId) are now injected into the base logger so every log line carries them automatically without callers needing to pass them.
Cloudflare log viewers split multi-line pretty output into separate entries, which made request traces hard to read. Emit each pretty log as one line with inline context so events stay grouped and scannable.
The container's stdout/stderr goes through Cloudflare's log pipeline, which cannot render ANSI color codes and splits multi-line output into separate log entries. The container defaults to JSON format when the env var is absent, which is correct for production.
The container has two wide event layers that capture all request outcomes: 1. LoggingMiddleware — logs every HTTP request with method, path, status, duration 2. session.ts exec/execStream — logs every command execution with full context Service layers (file, git, port, process, interpreter, session-manager) were duplicating these by logging errors in every catch block, producing 3-5 log lines per failed request instead of 1-2. This violates the wide events principle: one context-rich event per service hop, not scattered fragments. Changes: - Remove duplicate error logs from all service catch blocks - Fold PID pipe fallback reason into the execStream wide event as pidFallback - Remove unused logger parameter from GitService and PortService - Update container.ts constructor calls and test files accordingly
Pretty-print mode embedded raw newlines from stack traces and error messages, causing Cloudflare's log pipeline to fragment single events into multiple entries. Escape \n and \r in all pretty-print string values. Also add a Bun.serve() error handler to catch unhandled fetch errors before Bun's default handler writes raw stack traces.
Accept main's removal of streamProcessLogs from ProcessService, which was deleted in PR #447.
HTTP middleware now uses "POST /api/contexts" instead of "HTTP request" so the dashboard list is scannable. Container-side wide events adopt the same noun.verb convention already used on the DO side.
The dashboard header now reads "POST /api/contexts 501" so you can see method, path, and outcome without expanding. Removed startedAt, method, pathname, userAgent, contentLength, and outcome from the wide event context since they were either in the message, derivable from level, or noise for internal SDK calls.
Pass captured Error objects to logger.error() in finally blocks instead of undefined, preserving stack traces for debugging. Standardize timing fields to durationMs across all wide events. Add truncation indicator to normalizeLogLine so developers know when log context was lost. Add missing changeset.
Co-authored-by: whoiskatrin <whoiskatrin@users.noreply.github.com>
Resolve conflicts in session-manager.ts by keeping the classifyCommandError logic from main (session-destroyed detection) while removing explicit service-level logging per the canonical log lines approach.
Emit a single structured event per operation (domain.operation naming) with outcome, durationMs, and relevant context fields in every major service so developers can debug sandboxes from logs alone. - FileService: file.read/write/delete/mkdir with sizeBytes and path - ProcessService: process.start (with pid) and process.exit (with exitCode) - PortService: port.expose/unexpose with logger injection via constructor - GitService: git.clone/checkout with logger injection via constructor - SessionManager: session.create/destroy with cwd and sessionId - LoggingMiddleware: include sandboxId from X-Sandbox-Id request header - BaseHttpClient: support defaultHeaders merged into every outgoing request - Sandbox DO: pass X-Sandbox-Id header to container; debug logs for port ops - Logger pretty-printer: show HH:MM:SS.mmm timestamp prefix for readability - Tests: new LoggingMiddleware tests; defaultHeaders tests; fix constructor signatures for PortService and GitService in existing test files
🦋 Changeset detectedLatest commit: 320da09 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
commit: |
🐳 Docker Images Published
Usage: FROM cloudflare/sandbox:0.0.0-pr-456-8cc7935Version: 📦 Standalone BinaryFor arbitrary Dockerfiles: COPY --from=cloudflare/sandbox:0.0.0-pr-456-8cc7935 /container-server/sandbox /sandbox
ENTRYPOINT ["/sandbox"]Download via GitHub CLI: gh run download 22732170705 -n sandbox-binaryExtract from Docker: docker run --rm cloudflare/sandbox:0.0.0-pr-456-8cc7935 cat /container-server/sandbox > sandbox && chmod +x sandbox |
Log background stream errors instead of silently swallowing them in ProcessService. Rename snapshot.unmount event to backup.unmount for consistency with the domain.operation naming convention. Add method and pathname as discrete fields in the HTTP logging middleware for JSON log querying. Preserve original error cause in InterpreterService deleteContext. Remove unrelated skills-lock.json.
|
All fixes are committed. Here's a summary of what was addressed: Fixes applied
|
|
Conflicts resolved. Here's a summary of what was merged and how each conflict was handled: Conflicts resolved (3 files)1.
2.
3.
Verification
|
🐳 Docker Images Published
Usage: FROM cloudflare/sandbox:0.0.0-pr-456-320da09Version: 📦 Standalone BinaryFor arbitrary Dockerfiles: COPY --from=cloudflare/sandbox:0.0.0-pr-456-320da09 /container-server/sandbox /sandbox
ENTRYPOINT ["/sandbox"]Download via GitHub CLI: gh run download 23595389767 -n sandbox-binaryExtract from Docker: docker run --rm cloudflare/sandbox:0.0.0-pr-456-320da09 cat /container-server/sandbox > sandbox && chmod +x sandbox |
| { port, name } | ||
| ); | ||
|
|
||
| caughtError = error instanceof Error ? error : new Error(String(error)); |
There was a problem hiding this comment.
nit: Not for this PR, but I noticed we have loads of error instanceof Error ? error : new Error(String(error)); and const errorMessage = error instanceof Error ? error.message : 'Unknown error';. Could be nice to have a helper func like asError() which handles all this just for deduplication? (I've also implemented this a couple times in my PR)
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
* Add sanitize helpers, relocate redactCredentials redactCredentials() is a general-purpose string sanitizer with no git-specific logic. Moved to logger/sanitize.ts alongside new helpers for presigned URL params and command truncation. * Anchor sensitive param regex to query strings Require [?&] prefix so path segments like /api/token/ are not matched. Stop value matching at URL delimiters to avoid consuming trailing command arguments. Clamp truncateForLog for tiny maxLen. * Add OutputMode to replace boolean pretty flag The logger now supports three output modes: 'structured' emits raw objects for Workers Logs auto-indexing, 'json-line' emits JSON strings for container stdout, and 'pretty' emits ANSI-formatted strings for local dev. Container component always uses json-line regardless of SANDBOX_LOG_FORMAT. Renames msg to message for Cloudflare dashboard compatibility, and removes deprecated duration/operation fields from LogContext. * Add canonical event helpers for structured logging buildMessage() constructs scannable one-line messages from event payloads. logCanonicalEvent() selects log level from outcome and sanitizes commands before emitting structured context. * Remove logSuccess/logError from all DO clients These methods produced unqueryable string blobs in Workers Observability. The container already logs all operations with properly structured, queryable canonical events. DO-specific canonical events (sandbox.exec, backup.*, etc.) are added in the next task. * Migrate DO canonical events to logCanonicalEvent() All DO-side events now emit structured objects with scannable messages, consistent field names, and automatic credential redaction. * Fix command outcome semantics: non-zero exitCode is error The outcome field was hardcoded to 'success' even when commands exited with non-zero codes (e.g., exitCode 127 for command not found). This made outcome unreliable for filtering failed commands in logs. Both exec() and execStream() now derive outcome from the exit code. Log level selection uses caughtError to distinguish exceptions from normal non-zero exits. * Migrate container canonical events to logCanonicalEvent() Migrates command.exec, command.stream, session.create, session.destroy, file.read, file.write, file.delete, file.mkdir, port.expose, port.unexpose, git.clone, and git.checkout to the shared canonical event helper. Fixes error-at-info-level pattern across all services. Adds stderrPreview redaction to close credential leak path in failed backup commands. * Demote HTTP middleware logs to debug, migrate process events Successful middleware requests (2xx) now log at debug since domain events already capture every operation at info. Client errors (4xx) log at warn, server errors (5xx) remain at error. Process start and exit events now use logCanonicalEvent() for consistent structure. * Fix double-redaction and improve CanonicalEventPayload type safety Sanitize command once in logCanonicalEvent, pass to buildMessage. Add common domain fields (commandId, processId, stdoutLen, etc.) to the typed interface for autocomplete and typo detection. Document the priority ordering in buildMessage's if/else chain. * Auto-derive errorMessage, extend buildMessage context errorMessage is now resolved from error.message before buildMessage so error reasons appear in dashboard messages without requiring every call site to set errorMessage explicitly. buildMessage extended for backup, git.checkout, and bucket mount events via backupId, repoPath, and mountPath fields. * Clean up canonical event fields and add error coverage Remove r2Key from backup.create (leaks internal R2 layout), replace mountResults array with scalar counts on sandbox.destroy, and strip internal PID detection debug fields from command.stream. Wrap port.expose, port.unexpose, backup.create, backup.restore, and sandbox.destroy in try/finally so the canonical event fires on both success and error paths. * Migrate bucket.mount/unmount to logCanonicalEvent Replace this.logger.info() calls with logCanonicalEvent() and wrap both operations in try/finally for error path coverage. * Type exec state accumulator, migrate backup events, add errorMessage tracking Eliminates 12 type casts in session.ts by typing the exec state accumulator. Migrates container backup.create/restore/unmount to logCanonicalEvent. Adds errorMessage tracking on non-throwing error paths across container services. * Fix credential leak in errorMessage and logging gaps Sanitize errorMessage with redactSensitiveParams before logging to prevent presigned URL credentials from appearing in log output. Fix const/let errorMessage shadowing in catch blocks across five container services so finally blocks see the correct value. Populate stderrPreview in the exec path. Move DO validation inside try/finally for exposePort, unexposePort, mountBucketFuse, and unmountBucket so validation errors get canonical logging. Add path context to backup.create events. Replace raw logger call with canonical event for process stream errors. * Fix credential leak in error objects, outcome semantics, and coverage gaps Sanitizes Error objects (message + stack) before logging to prevent presigned URLs from appearing in log output. Uses redactCommand for errorMessage and stderrPreview instead of redactSensitiveParams. Fixes sandbox.exec outcome to check exitCode for non-throwing failures. Moves backup validation inside try/finally at both DO and container layers. Adds errorMessage on non-throwing error paths in file and git services. Adds canonical event for local bucket mounts. Removes duplicate backup info log. Includes branch in git.checkout buildMessage with repoPath. * Fix git token leak, demote middleware duplicates, clean up stale logs Apply redactCommand to git URLs in security-service, git-handler, and sanitizeGitData to cover query-style tokens (?token=...) that redactCredentials alone missed. Demote all middleware HTTP logs to debug since canonical events already cover operational failures. Remove leftover info logs for backup git-check and bucket unmount. Add mountsProcessed/mountFailures to sandbox.destroy message. * Close remaining git token leak and duplicate log Use redactCommand instead of redactCredentials in git clone error construction so query-param tokens are also stripped. Remove duplicate handler-level error log for failed clones. Remove redundant dir field from container backup.create. * Log command completions at info, not error Non-zero exit codes from normal shell commands (grep no match, test -d, diff) are expected behavior, not infrastructure errors. The exitCode field is queryable for anyone who needs to filter by it. Only actual exceptions (container unreachable, timeout) produce error-level logs. * Respect SANDBOX_LOG_FORMAT across all components, forward to container All components (DO, container, executor) use pretty mode when SANDBOX_LOG_FORMAT=pretty is set, giving readable terminal output during local wrangler dev. Without the var, DOs default to structured and containers/executors default to json-line for their respective observability pipelines. Forwards SANDBOX_LOG_FORMAT to the container environment so the setting propagates. Removes the var from the E2E wrangler template so CI deployments default to production behavior. * Fix backup cleanup session ID and version.check log level Backup catch blocks used a derived session ID instead of the hardcoded '__sandbox_backup__' from ensureBackupSession(), making archive cleanup silently fail on backup errors. Version mismatches during rolling deployments are expected and should not produce error-level logs. The versionOutcome field carries the granular status for filtering. * Demote background process output to debug, fix session noise Desktop helper stdout/stderr (VNC, X11, xfwm4) demoted from info/warn to debug — diagnostic output, not operational events. User CMD process stdout/stderr demoted from info/warn to debug — user-spawned server access logs shouldn't pollute container logs. Session "already exists" outcome changed to success — an existing session is usable, not an error. The 409 API response is preserved. * Reduce log noise with origin-based severity and centralized policy (#521) * Use unique backup session IDs to prevent collision The fixed __sandbox_backup__ name allowed user-created sessions to collide with internal backup commands, poisoning cwd/env/PATH. Unique per-operation IDs with cleanup in finally prevent leaks. * Centralize severity policy in resolveLogLevel Errors always stay at error. Internal commands demote to debug. Session lifecycle and file operations demote to debug on success. successLevel option for special cases like version.check. * Add origin metadata to execute paths on DO side Internal commands (backup, bucket mount, env setup) are marked with origin: 'internal' so resolveLogLevel can demote them to debug. User commands default to origin: 'user'. * Propagate origin through container execute paths Refactors executeInSession to options object. execute-handler reads origin from request and forwards through SessionManager to Session. Canonical events include origin for automatic debug demotion. * Add version.check severity matrix compatible→debug, unknown→info, mismatch→warn, failed→warn. Non-compatible cases include versionOutcome in the message. * Mark container shell-outs as internal origin File, backup, git, session-manager, and process-service shell-outs all pass origin: 'internal' so resolveLogLevel demotes them to debug automatically. * Add origin to process-client startProcess for completeness * Test logCanonicalEvent debug/warn dispatch and precedence * Keep 5xx middleware logs at warn, fix process-client pattern 5xx responses need visibility at the HTTP summary level even when canonical domain events capture the operation failure — they answer different questions. 2xx/3xx/4xx stay at debug. Fix origin field in process-client to use conditional spread pattern consistent with all other optional fields. * Fix middleware test expectations for 5xx at warn level * Remove SANDBOX_LOG_FORMAT from perf wrangler template
There was a problem hiding this comment.
🟡 HTTP streaming GET requests don't receive defaultHeaders when transport bypasses doFetch
In doStreamFetch HTTP mode for GET requests, the method calls this.doFetch() which correctly merges defaultHeaders. However, the streamHeaders variable (which correctly includes defaultHeaders for the WebSocket path) is computed but never used in the HTTP path. This works correctly today because doFetch applies defaultHeaders automatically.
However, the inconsistency means the HTTP streaming path for GET requests sends a Content-Type: application/json header (hardcoded at base-client.ts:234) even though there's no body. For WebSocket GET streaming, streamHeaders is just this.options.defaultHeaders without Content-Type. While functionally harmless now, the HTTP path for GET streaming should not force Content-Type: application/json when there's no body — it should match the WebSocket path's behavior. This is pre-existing behavior not introduced by this PR, so I'm noting it's non-severe.
(Refers to lines 232-236)
Was this helpful? React with 👍 or 👎 to provide feedback.
| } finally { | ||
| const statusCode = response?.status ?? 500; | ||
| const durationMs = Date.now() - startTime; | ||
| const isError = statusCode >= 500 || Boolean(requestError); |
There was a problem hiding this comment.
🟡 Unused isError variable in LoggingMiddleware
The variable isError at line 31 is computed (statusCode >= 500 || Boolean(requestError)) but never referenced again. The subsequent log-level decision at line 44 redundantly recomputes statusCode >= 500. This dead code makes the logic harder to reason about — a reader might expect isError to govern the log level selection, but it doesn't.
| const isError = statusCode >= 500 || Boolean(requestError); |
Was this helpful? React with 👍 or 👎 to provide feedback.
What
Replaces scattered start/end log pairs with a single structured event per operation, following the canonical log lines pattern. Each operation emits exactly one log event at completion with
outcome,durationMs, and relevant context fields.Changes
Container services — canonical log events added to:
file.read,file.write,file.delete,file.mkdir(withsizeBytes,path)process.start(withpid),process.exit(withexitCode,durationMs)port.expose,port.unexposegit.clone,git.checkoutsession.create,session.destroycommand.exec,command.streambackup.create,backup.restore,bucket.mount,bucket.unmount,sandbox.destroySDK / Durable Object — debug-level logs for
port.expose,port.unexpose;X-Sandbox-Idheader propagated to container so every log event includessandboxIdLogger — pretty printer now shows
HH:MM:SS.mmmtimestamp prefix;serviceVersionandinstanceIdauto-injected into base contextInfrastructure —
BaseHttpClientsupportsdefaultHeadersmerged into every outgoing requestEvent naming
All events follow
domain.operation:file.read,process.exit,port.expose,git.clone,session.create,command.exec,bucket.mount, etc.Testing
Bun.Terminalunavailable on macOS, unrelated to this PR)