Skip to content

Add disconnected file change checks#493

Merged
ghostwriternr merged 7 commits intomainfrom
kate/watch-state-design
Apr 1, 2026
Merged

Add disconnected file change checks#493
ghostwriternr merged 7 commits intomainfrom
kate/watch-state-design

Conversation

@whoiskatrin
Copy link
Copy Markdown
Collaborator

@whoiskatrin whoiskatrin commented Mar 13, 2026

Summary

  • add sandbox.checkChanges() for apps that disconnect and reconnect later, but still need to know whether files changed in the meantime
  • keep sandbox.watch() as the live event-stream API for connected consumers
  • return a simple status (unchanged, changed, or resync) plus a version token so callers can cheaply skip work, sync incrementally, or do a full rescan

Why

Some consumers do not stay connected to a watch stream. They just need to reconnect later and ask whether a path changed while they were away.

This change adds that simpler workflow directly instead of exposing the lower-level watch coordination protocol. The API is intentionally an invalidation check, not an event log, and retained state only lasts for the current container lifetime.

Example

const first = await sandbox.checkChanges("/workspace")

const next = await sandbox.checkChanges("/workspace", {
  since: first.version,
})

if (next.status === "changed") {
  await backup()
}

if (next.status === "resync") {
  await fullRescan()
}

Keep file watches usable for hibernating Durable Objects by separating
live SSE delivery from retained watch state in the container.

Add owner-scoped acknowledgement and idle expiry so background
consumers can reconcile safely without sharing a global dirty bit.
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Mar 13, 2026

🦋 Changeset detected

Latest commit: e736ac1

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@cloudflare/sandbox Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@whoiskatrin
Copy link
Copy Markdown
Collaborator Author

/bonk review this PR

Pass ownerId through Sandbox.ensureWatch so persistent watches keep
their ownership metadata and the reconnect workflow can validate the
same consumer across ack and stop operations.
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Mar 13, 2026

Open in StackBlitz

npm i https://pkg.pr.new/cloudflare/sandbox-sdk/@cloudflare/sandbox@493

commit: e736ac1

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 13, 2026

🐳 Docker Images Published

Variant Image
Default cloudflare/sandbox:0.0.0-pr-493-e736ac1
Python cloudflare/sandbox:0.0.0-pr-493-e736ac1-python
OpenCode cloudflare/sandbox:0.0.0-pr-493-e736ac1-opencode
Musl cloudflare/sandbox:0.0.0-pr-493-e736ac1-musl
Desktop cloudflare/sandbox:0.0.0-pr-493-e736ac1-desktop

Usage:

FROM cloudflare/sandbox:0.0.0-pr-493-e736ac1

Version: 0.0.0-pr-493-e736ac1


📦 Standalone Binary

For arbitrary Dockerfiles:

COPY --from=cloudflare/sandbox:0.0.0-pr-493-e736ac1 /container-server/sandbox /sandbox
ENTRYPOINT ["/sandbox"]

Download via GitHub CLI:

gh run download 23848723310 -n sandbox-binary

Extract from Docker:

docker run --rm cloudflare/sandbox:0.0.0-pr-493-e736ac1 cat /container-server/sandbox > sandbox && chmod +x sandbox

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Mar 13, 2026

@whoiskatrin Bonk workflow was cancelled.

View workflow run · To retry, trigger Bonk again.

Replace the initial ownership-flavoured watch API with a cleaner
checkpoint and lease model for background consumers.

Use `changed`, `checkpointWatch()`, and returned lease tokens for the
public flow, while `resumeToken` keeps `ensureWatch()` retryable
without exposing another consumer's lease.
Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Persistent file watch state

The design is well-thought-out. The watch-as-invalidation-signal model (cursor + changed + overflowed) is the right abstraction for DO-based consumers that wake up periodically — much better than trying to replay an event log. The lease/resume token split is clean, and the expiry-on-idle approach avoids leaking watches.

I found several issues worth addressing, ranging from a concurrency bug to some cleanup items.


Issues

1. Race condition in stopWatch lease verification (watch-service.ts:208-236)

stopWatch calls verifyPersistentWatchLease before awaiting watch.ready.promise, unlike checkpointWatch which correctly awaits readiness first. If stopWatch is called while the watch is still initializing, watch.persistent may still be false even though ensureWatch is in-flight, causing verifyPersistentWatchLease to return an error ("Only persistent watches can stop") when it should succeed.

Consider adding the same await watch.ready.promise guard as in checkpointWatch, or document why the asymmetry is intentional.

2. handleStopWatch validates leaseToken even when undefined (watch-handler.ts:228-233)

extractQueryParam returns null which gets coerced to undefined via ?? undefined. Then validateToken('leaseToken', undefined) is called, which returns null (passes validation). This works by accident but the flow is confusing — validateToken silently accepts undefined because it treats it as "not provided", yet stopWatch at the service level requires a lease token for persistent watches. The handler does no semantic validation that leaseToken is actually present when stopping a persistent watch; that burden falls entirely on the service layer.

This is fine as-is (the service layer catches it), but the handler-level validation feels like it's doing work that doesn't accomplish anything for the stop case. Consider documenting the intentional pass-through, or adding a comment that lease enforcement is in the service layer.

3. normalizePatterns called redundantly in getOrCreateWatch (watch-service.ts:260-262)

normalizePatterns is called at the top of getOrCreateWatch, then the results are stored on the ActiveWatch. But createWatchKey (called on line 264) also calls normalizePatterns internally. The patterns are normalized 3 times total for a single getOrCreateWatch call. Minor, but easy to clean up by passing the already-normalized values into createWatchKey.

4. Missing resumeToken in createWatchKey (watch-service.ts:342-354)

The watch key does not include resumeToken, which means two ensureWatch calls with different resume tokens but the same path/filters will resolve to the same ActiveWatch. This is likely intentional (the key identifies the filesystem watch, not the lease holder), but worth a comment since it's a subtle design choice.

5. WatchStopOptions.leaseToken is optional in the type but semantically required for persistent watches

The WatchStopOptions interface (types.ts:823-825) has leaseToken?: string, which makes it easy for callers to forget the token. The error only surfaces at runtime. Consider documenting this requirement in the JSDoc, or adding a PersistentWatchStopOptions with a required leaseToken field.

6. E2E test helpers use wrong API paths (file-watch-workflow.test.ts:250-268)

The createFile, createDir, and deleteFile helpers in the E2E test use paths like /api/file/write, /api/file/mkdir, /api/file/delete — but looking at the test-worker (index.ts), the actual endpoints are /api/write, /api/mkdir, /api/delete (no /file/ prefix). These helpers will 404.

Unless there's middleware in the test worker I'm missing, these E2E tests won't work as written. This needs verification.


Minor / Nits

  • The WatchState.expiresAt is string | null — consider making it consistently null when not persistent, and only setting it to a non-null ISO string when the idle timer is active. Currently it looks correct, just noting it for documentation.

  • The changeset description is good — user-focused and explains the use case.

  • PERSISTENT_WATCH_IDLE_TTL_MS at 10 minutes is reasonable. Consider whether this should be configurable (probably not for v1, but worth a TODO).


What looks good

  • The Deferred pattern with readyState tracking is clean and avoids double-resolve bugs
  • Event coalescing with pendingEvents Map keyed by event identity is a good optimization
  • The stopPromise deduplication prevents concurrent stop races
  • Lease token verification is consistent across checkpoint/stop paths (modulo the race noted above)
  • Good test coverage of the checkpoint cursor semantics (match vs stale)
  • The broadcastTerminalEvent / closeSubscriber separation handles the subscriber lifecycle correctly

ask-bonk[bot]

This comment was marked as resolved.

ask-bonk[bot]

This comment was marked as resolved.

ask-bonk[bot]

This comment was marked as resolved.

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Mar 13, 2026

Review is complete. I posted:

  1. A detailed top-level review comment covering 6 issues found across the PR, from a race condition in stopWatch to minor cleanup items, plus positive callouts for what's well-done.

  2. Three inline comments on specific lines:

    • watch-service.ts:221 - Race condition where stopWatch doesn't await watch.ready.promise before lease verification (unlike checkpointWatch which does)
    • watch-service.ts:260 - Redundant normalizePatterns calls
    • file-watch-workflow.test.ts:436 - Suggestion for negative E2E test cases

The most actionable issue is the stopWatch race condition -- if a caller gets a lease token from ensureWatch and immediately calls stopWatch before the watch finishes initializing, the lease check will fail because watch.persistent hasn't been set to true yet. The fix is straightforward: add the same await watch.ready.promise guard that checkpointWatch already has.

github run

Clarify stopWatch token validation, remove redundant key
normalization work, and normalize legacy watch responses so clients
still see `changed` while older paths return `dirty`.
@whoiskatrin whoiskatrin marked this pull request as ready for review March 13, 2026 18:19
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

scuffi
scuffi previously approved these changes Mar 16, 2026
Copy link
Copy Markdown
Contributor

@scuffi scuffi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! 🚀

Background consumers only need to know whether a path changed
while disconnected. Replace the lease-based persistent watch
API with checkChanges() so callers store one version token and
choose whether to skip work, sync incrementally, or rescan.
@whoiskatrin whoiskatrin changed the title Add persistent file watch state Add disconnected file change checks Mar 17, 2026
Copy link
Copy Markdown
Member

@ghostwriternr ghostwriternr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super clean. Good call to refactor the existing watch layer to support this too.

@ghostwriternr ghostwriternr enabled auto-merge (squash) April 1, 2026 12:33
@ghostwriternr ghostwriternr merged commit fdd3efa into main Apr 1, 2026
20 checks passed
@ghostwriternr ghostwriternr deleted the kate/watch-state-design branch April 1, 2026 12:45
@sandy-bonk sandy-bonk bot mentioned this pull request Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants