You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Writeback drafts are ingested into the durable outbox by the running daemon's sync cycle (pushLocal). A draft written after the daemon's last cycle and just before shutdown — e.g. a one-shot sandbox writing a final fire-and-forget reply right before teardown — is on disk but not yet in the outbox. The teardown cleanup runs --flush-outbox-once (Syncer.FlushOutboxOnce), which is outbox-only, no local scan (the "flush-124 cure"), so that draft is never ingested and is silently dropped.
cloud #2268 makes the mount cleanup shell feature-detect --push-local-once and use it only when pending local writes are detected (find -newer $RELAYFILE_MOUNT_FLUSH_MARKER), keeping --flush-outbox-once as the no-pending-writes fast path.
This works, but it's a two-mode design with the "is there pending work?" decision pushed up into the cloud cleanup shell — because pushLocal calls scanLocalFiles (a full local-tree walk), which is the exact cost --flush-outbox-once exists to avoid (flush-124 timeouts on large mounts). So we can't just always run the drain.
The right fix
Make the teardown drain always correct AND always cheap, so there's a single mode and no cloud-side conditional:
Give the drain a bounded local-write detection — scan only files changed since a marker (mtime-since-baseline, or a persisted dirty-set / journal), not the whole tree via scanLocalFiles.
Then a single teardown call ("ingest pending drafts, then flush outbox") is safe to run unconditionally: O(pending) instead of O(tree), so no flush-124 regression even on large mounts.
Outcome
Retire the --flush-outbox-once vs --push-local-once split (or fold both into one always-on drain).
Drop the cloud cleanup-shell find -newer mode-selection (cloud #2268) — the daemon becomes correct on its own.
Fire-and-forget writebacks become reliable for every agent with no per-call timeouts and no caller having to reason about which flush mode to use.
Pointers
internal/mountsync/syncer.go: pushLocal → scanLocalFiles (full walk); FlushOutboxOnce (outbox-only); PushLocalAndFlushOnce (feat(mount): --push-local-once teardown drain (stop dropping last-moment writeback drafts) #304, full pushLocal + flush); HandleLocalChange (the event-driven ingest path, which already knows dirty paths in the running daemon — but the cleanup runs as a fresh process without that in-memory state, hence the scan).
The watcher/coalescer already tracks per-path changes in the running daemon; a persisted dirty-set (or .relay-tracked changed-since-marker list) would let the fresh cleanup process drain only pending paths.
Motivation
Surfaced while fixing intermittently-dropped threaded Slack replies from scheduled scan agents (AgentWorkforce/cloud#2261 threading + #2268 / relayfile #304 drain). The flag-based fix resolves it; this issue tracks the cleaner single-mode design.
Problem
Writeback drafts are ingested into the durable outbox by the running daemon's sync cycle (
pushLocal). A draft written after the daemon's last cycle and just before shutdown — e.g. a one-shot sandbox writing a final fire-and-forget reply right before teardown — is on disk but not yet in the outbox. The teardown cleanup runs--flush-outbox-once(Syncer.FlushOutboxOnce), which is outbox-only, no local scan (the "flush-124 cure"), so that draft is never ingested and is silently dropped.Interim fix (shipped)
--push-local-once/Syncer.PushLocalAndFlushOnce: onepushLocalpass (scans the on-disk mirror in the fresh cleanup process, ingesting drafts the daemon missed) + outbox flush, skipping pullRemote/digest/websocket.--push-local-onceand use it only when pending local writes are detected (find -newer $RELAYFILE_MOUNT_FLUSH_MARKER), keeping--flush-outbox-onceas the no-pending-writes fast path.This works, but it's a two-mode design with the "is there pending work?" decision pushed up into the cloud cleanup shell — because
pushLocalcallsscanLocalFiles(a full local-tree walk), which is the exact cost--flush-outbox-onceexists to avoid (flush-124 timeouts on large mounts). So we can't just always run the drain.The right fix
Make the teardown drain always correct AND always cheap, so there's a single mode and no cloud-side conditional:
scanLocalFiles.Outcome
--flush-outbox-oncevs--push-local-oncesplit (or fold both into one always-on drain).find -newermode-selection (cloud #2268) — the daemon becomes correct on its own.Pointers
internal/mountsync/syncer.go:pushLocal→scanLocalFiles(full walk);FlushOutboxOnce(outbox-only);PushLocalAndFlushOnce(feat(mount): --push-local-once teardown drain (stop dropping last-moment writeback drafts) #304, full pushLocal + flush);HandleLocalChange(the event-driven ingest path, which already knows dirty paths in the running daemon — but the cleanup runs as a fresh process without that in-memory state, hence the scan)..relay-tracked changed-since-marker list) would let the fresh cleanup process drain only pending paths.Motivation
Surfaced while fixing intermittently-dropped threaded Slack replies from scheduled scan agents (AgentWorkforce/cloud#2261 threading + #2268 / relayfile #304 drain). The flag-based fix resolves it; this issue tracks the cleaner single-mode design.