Skip to content

fix(session): gate writer on hydration success + roll backups#1160

Open
brennanb2025 wants to merge 4 commits intomainfrom
brennanb2025/fix-session-hydration-data-loss
Open

fix(session): gate writer on hydration success + roll backups#1160
brennanb2025 wants to merge 4 commits intomainfrom
brennanb2025/fix-session-hydration-data-loss

Conversation

@brennanb2025
Copy link
Copy Markdown
Contributor

@brennanb2025 brennanb2025 commented Apr 27, 2026

Problem

orca-data.json (the persisted workspace session: tabs, worktree assignments, layouts) can be silently wiped when startup hydration fails:

  • Mode 1 — catch handler resets in-memory state. src/renderer/src/App.tsx:308-340 wrapped hydration in try/catch. On any throw (fetchRepos, fetchAllWorktrees, session.get(), or hydrateWorkspaceSession itself) the catch called hydrateWorkspaceSession({ tabsByWorktree: {}, ... }), zeroing the in-memory tab map. The debounced session writer then fired and overwrote disk with that empty state.
  • Mode 2 — worktree-id filter drops tabs. src/renderer/src/store/slices/terminals.ts filters tabsByWorktree against the currently-known worktrees. A partial fetchAllWorktrees result (slow SSH sync, etc.) drops tabs for unknown worktrees; the next writer tick persists the dropped state.

No backups, no logs outside the renderer console — users only found out after they'd already lost work.

Solution

This PR implements fixes A + B from the issue; explicitly defers fix C (broaden filter) and fix D (fail-loud UI) to follow-ups.

Fix A — gate the session writer on hydration success

  • Add a hydrationSucceeded flag (starts false) to the terminal slice and setHydrationSucceeded(true) action.
  • App.tsx flips it to true only after hydrateWorkspaceSession(session) returns without throwing.
  • Both session writers (debounced window.api.session.set and the beforeunload setSync) now short-circuit via a shared shouldPersistWorkspaceSession(state) helper in workspace-session.ts. The helper requires both workspaceSessionReady && hydrationSucceeded.
  • The catch handler no longer calls hydrateWorkspaceSession with empty defaults — the in-memory state is left untouched, hydrationSucceeded stays false, and the writer stays a no-op for the rest of this process. The UI still mounts because reconnectPersistedTerminals() continues to flip workspaceSessionReady.
  • Error log now includes the error message and context ([startup] Workspace session hydration failed; leaving disk state untouched: <msg>) so user reports are diagnosable.

Fix B — rolling backups for orca-data.json

  • Before each overwrite, rotate .bak.3 → .bak.4, .bak.2 → .bak.3, …, .bak.0 → .bak.1, and orca-data.json → .bak.0. Keeps 5 snapshots.
  • Rotation is gated to at most once per hour (BACKUP_MIN_INTERVAL_MS = 60 * 60 * 1000) so the 300ms debounced tick doesn't replace the ring with near-identical snapshots within minutes.
  • The current file is copied (not renamed) to .bak.0 so orca-data.json never temporarily disappears between rotation and the new write — a crash in that window would otherwise strand load() on defaults.
  • Both async (writeToDiskAsync) and sync (writeToDiskSync / flush()) paths rotate consistently.

Out of scope (follow-ups)

  • Fix C (broaden worktree-id filter / revisit the partial-SSH guard): intentionally deferred. Narrower scope preferred here.
  • Fix D (fail-loud UI with a retry/discard dialog): separate UX project.

Test plan

  • pnpm exec vitest run src/main/persistence.test.ts — 41 tests, all green. New coverage:
    • does not rotate when orca-data.json does not yet exist
    • snapshots the previous file to .bak.0 before overwriting
    • keeps at most 5 rotating backups (.bak.0.bak.4; no .bak.5)
    • does not rotate more than once per hour
  • pnpm exec vitest run src/renderer/src/lib/workspace-session.test.ts — added the shouldPersistWorkspaceSession gate matrix + an integration-style test that simulates the App.tsx subscribe pattern and asserts the writer is never invoked after a failed hydration, and fires once flags are both set.
  • pnpm exec vitest run src/renderer/src/store/slices/terminals-hydration.test.ts — added hydrationSucceeded default-false, toggle, and "hydrateWorkspaceSession does not flip the flag on its own" tests.
  • All other store/lib tests still green (pnpm exec vitest run src/renderer/src/store/ — 280 passed). Three pre-existing test failures (local-pty-provider.test.ts, pty.test.ts, terminal-attribution.test.ts) fail identically on main and are unrelated.
  • pnpm run lint — no new warnings.

Manual verification to run on merge

  1. Corrupt orca-data.json to force a hydration error (e.g. replace workspaceSession.tabsByWorktree with an invalid shape that throws in the renderer), relaunch, confirm on-disk file is not overwritten and .bak.0 appears after the next successful session.
  2. After an hour of normal usage, inspect ~/Library/Application Support/orca/ and confirm up to 5 rolling .bak.N files exist.

Made with Orca 🐋

Protects orca-data.json from two silent-data-loss failure modes:

- Fix A: add a `hydrationSucceeded` flag. The debounced + shutdown
  session writers now short-circuit until hydration completed without
  throwing, so a crash in fetchRepos / fetchAllWorktrees / session.get /
  hydrateWorkspaceSession can no longer serialize an empty in-memory
  store over the user's persisted tabs. The App.tsx catch handler no
  longer calls hydrateWorkspaceSession with empty defaults; it leaves
  the in-memory state untouched and only flips workspaceSessionReady
  (via reconnectPersistedTerminals) so the UI can mount.

- Fix B: rotate 5 rolling backups (.bak.0 – .bak.4) of orca-data.json
  before each overwrite, gated to at most one rotation per hour so the
  debounced tick doesn't churn disk. The current file is copied (not
  renamed) to .bak.0 so the primary file never temporarily disappears.

Defers to follow-ups: broadening the worktree-id filter guard (fix C)
and the fail-loud hydration error UI (fix D).

Co-authored-by: Orca <help@stably.ai>
@github-actions github-actions Bot added the size/m Medium PR (≤600 added lines, ≤25 files) label Apr 27, 2026
brennanb2025 and others added 3 commits April 27, 2026 00:46
Co-authored-by: Orca <help@stably.ai>
Co-authored-by: Orca <help@stably.ai>
Addresses two gaps in the #1158 error path that mirrored the session
writer bug the PR fixes:

- Guard defaults-hydration in App.tsx catch with a uiHydrated flag so a
  late-stage startup failure can't overwrite already-loaded ui.json
  values (sidebar width, sort, filters) with hardcoded defaults.
- Wrap reconnectPersistedTerminals() in the catch with a try/catch and
  force workspaceSessionReady=true via setState on failure, so a crash
  in the recovery step can't leave the user staring at a blank window.

Also adds async-path coverage for the 1-hour backup rotation gate —
existing tests only exercised the sync flush path.

Co-authored-by: Orca <help@stably.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m Medium PR (≤600 added lines, ≤25 files)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant