Add disconnected file change checks#493
Conversation
Keep file watches usable for hibernating Durable Objects by separating live SSE delivery from retained watch state in the container. Add owner-scoped acknowledgement and idle expiry so background consumers can reconcile safely without sharing a global dirty bit.
🦋 Changeset detectedLatest commit: e736ac1 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
/bonk review this PR |
Pass ownerId through Sandbox.ensureWatch so persistent watches keep their ownership metadata and the reconnect workflow can validate the same consumer across ack and stop operations.
commit: |
🐳 Docker Images Published
Usage: FROM cloudflare/sandbox:0.0.0-pr-493-e736ac1Version: 📦 Standalone BinaryFor arbitrary Dockerfiles: COPY --from=cloudflare/sandbox:0.0.0-pr-493-e736ac1 /container-server/sandbox /sandbox
ENTRYPOINT ["/sandbox"]Download via GitHub CLI: gh run download 23848723310 -n sandbox-binaryExtract from Docker: docker run --rm cloudflare/sandbox:0.0.0-pr-493-e736ac1 cat /container-server/sandbox > sandbox && chmod +x sandbox |
|
@whoiskatrin Bonk workflow was cancelled. View workflow run · To retry, trigger Bonk again. |
Replace the initial ownership-flavoured watch API with a cleaner checkpoint and lease model for background consumers. Use `changed`, `checkpointWatch()`, and returned lease tokens for the public flow, while `resumeToken` keeps `ensureWatch()` retryable without exposing another consumer's lease.
There was a problem hiding this comment.
Review: Persistent file watch state
The design is well-thought-out. The watch-as-invalidation-signal model (cursor + changed + overflowed) is the right abstraction for DO-based consumers that wake up periodically — much better than trying to replay an event log. The lease/resume token split is clean, and the expiry-on-idle approach avoids leaking watches.
I found several issues worth addressing, ranging from a concurrency bug to some cleanup items.
Issues
1. Race condition in stopWatch lease verification (watch-service.ts:208-236)
stopWatch calls verifyPersistentWatchLease before awaiting watch.ready.promise, unlike checkpointWatch which correctly awaits readiness first. If stopWatch is called while the watch is still initializing, watch.persistent may still be false even though ensureWatch is in-flight, causing verifyPersistentWatchLease to return an error ("Only persistent watches can stop") when it should succeed.
Consider adding the same await watch.ready.promise guard as in checkpointWatch, or document why the asymmetry is intentional.
2. handleStopWatch validates leaseToken even when undefined (watch-handler.ts:228-233)
extractQueryParam returns null which gets coerced to undefined via ?? undefined. Then validateToken('leaseToken', undefined) is called, which returns null (passes validation). This works by accident but the flow is confusing — validateToken silently accepts undefined because it treats it as "not provided", yet stopWatch at the service level requires a lease token for persistent watches. The handler does no semantic validation that leaseToken is actually present when stopping a persistent watch; that burden falls entirely on the service layer.
This is fine as-is (the service layer catches it), but the handler-level validation feels like it's doing work that doesn't accomplish anything for the stop case. Consider documenting the intentional pass-through, or adding a comment that lease enforcement is in the service layer.
3. normalizePatterns called redundantly in getOrCreateWatch (watch-service.ts:260-262)
normalizePatterns is called at the top of getOrCreateWatch, then the results are stored on the ActiveWatch. But createWatchKey (called on line 264) also calls normalizePatterns internally. The patterns are normalized 3 times total for a single getOrCreateWatch call. Minor, but easy to clean up by passing the already-normalized values into createWatchKey.
4. Missing resumeToken in createWatchKey (watch-service.ts:342-354)
The watch key does not include resumeToken, which means two ensureWatch calls with different resume tokens but the same path/filters will resolve to the same ActiveWatch. This is likely intentional (the key identifies the filesystem watch, not the lease holder), but worth a comment since it's a subtle design choice.
5. WatchStopOptions.leaseToken is optional in the type but semantically required for persistent watches
The WatchStopOptions interface (types.ts:823-825) has leaseToken?: string, which makes it easy for callers to forget the token. The error only surfaces at runtime. Consider documenting this requirement in the JSDoc, or adding a PersistentWatchStopOptions with a required leaseToken field.
6. E2E test helpers use wrong API paths (file-watch-workflow.test.ts:250-268)
The createFile, createDir, and deleteFile helpers in the E2E test use paths like /api/file/write, /api/file/mkdir, /api/file/delete — but looking at the test-worker (index.ts), the actual endpoints are /api/write, /api/mkdir, /api/delete (no /file/ prefix). These helpers will 404.
Unless there's middleware in the test worker I'm missing, these E2E tests won't work as written. This needs verification.
Minor / Nits
-
The
WatchState.expiresAtisstring | null— consider making it consistentlynullwhen not persistent, and only setting it to a non-null ISO string when the idle timer is active. Currently it looks correct, just noting it for documentation. -
The changeset description is good — user-focused and explains the use case.
-
PERSISTENT_WATCH_IDLE_TTL_MSat 10 minutes is reasonable. Consider whether this should be configurable (probably not for v1, but worth a TODO).
What looks good
- The
Deferredpattern withreadyStatetracking is clean and avoids double-resolve bugs - Event coalescing with
pendingEventsMap keyed by event identity is a good optimization - The
stopPromisededuplication prevents concurrent stop races - Lease token verification is consistent across checkpoint/stop paths (modulo the race noted above)
- Good test coverage of the checkpoint cursor semantics (match vs stale)
- The
broadcastTerminalEvent/closeSubscriberseparation handles the subscriber lifecycle correctly
|
Review is complete. I posted:
The most actionable issue is the |
Clarify stopWatch token validation, remove redundant key normalization work, and normalize legacy watch responses so clients still see `changed` while older paths return `dirty`.
Background consumers only need to know whether a path changed while disconnected. Replace the lease-based persistent watch API with checkChanges() so callers store one version token and choose whether to skip work, sync incrementally, or rescan.
ghostwriternr
left a comment
There was a problem hiding this comment.
Super clean. Good call to refactor the existing watch layer to support this too.
Summary
sandbox.checkChanges()for apps that disconnect and reconnect later, but still need to know whether files changed in the meantimesandbox.watch()as the live event-stream API for connected consumersunchanged,changed, orresync) plus aversiontoken so callers can cheaply skip work, sync incrementally, or do a full rescanWhy
Some consumers do not stay connected to a watch stream. They just need to reconnect later and ask whether a path changed while they were away.
This change adds that simpler workflow directly instead of exposing the lower-level watch coordination protocol. The API is intentionally an invalidation check, not an event log, and retained state only lasts for the current container lifetime.
Example