Skip to content

Conversation

@haraldschilly
Copy link
Contributor

@haraldschilly haraldschilly commented Jan 16, 2026

ref: #8702

  • open-files stats loop is currently commented out in src/packages/project/conat/open-files.ts line 165 (used for local debugging).

Root cause

  • Persist storage deletes old keyed rows without emitting deletes; clients used to retain those stale seqs in CoreStream raw/messages, so keyed updates (like open-files touches) grew unbounded over time.

Detailed explanation of the change

  • The leak is caused by keyed updates (like open-files “touch” writes) accumulating in CoreStream raw/messages arrays because the server deletes old keyed rows without notifying clients.
  • The fix keeps server behavior unchanged and does client-side, delayed cleanup: when a new keyed update arrives, CoreStream looks up the previous raw seq for that key and locally deletes that seq with processPersistentDelete(..., { noEmit: true }). This removes the stale entry from raw/messages while avoiding extra change events.
  • To keep merge semantics intact after cleanup, CoreStream now records the last value per key (lastValueByKey) so prev is still available during conflict resolution, even if the old seq is deleted from memory.
  • DKV maintains a remote cache for the same reason: if the CoreStream kv entry is evicted, DKV can still supply a prev value to the merge function.

Fix (client-side GC for keyed updates)

  • CoreStream now tracks the previous raw seq for a key and locally deletes that seq after the new keyed set is processed, via processPersistentDelete(..., { noEmit: true }). This keeps the in-memory arrays flat without changing server behavior.
  • CoreStream retains lastValueByKey so prev is still available for merge conflict resolution even after the old seq is removed.
  • DKV keeps a remote cache to supply prev to merge when the CoreStream kv entry has been evicted.

Other cleanup

  • PersistStreamClient now clears reconnect timers on close to avoid lingering timeouts in tests.

Tests

  • core-stream: added regression asserting repeated keyed updates do not grow stream length; expectations updated for GC semantics.
  • dkv-merge/open-files: added waits for async propagation and verified merge prev is preserved.
  • cluster/node-discovery: longer timeouts + explicit teardown to avoid lingering handles; node-forget now asserts the expected count after wait.

In a local test setup we see that open-files stats show keyed-KV arrays staying flat instead of growing unbounded. Example log lines:

2026-01-16T15:13:41.495Z ... open-files stats {
  openDocs: 1,
  stats: { entries: 7, dkv: { rawLength: 30, messagesLength: 30, kvLength: 30 } },
  rssMiB: 233
}
2026-01-16T15:15:26.495Z ... open-files stats {
  openDocs: 1,
  stats: { entries: 7, dkv: { rawLength: 30, messagesLength: 30, kvLength: 30 } },
  rssMiB: 245
}
2026-01-16T15:16:11.496Z ... open-files stats {
  openDocs: 1,
  stats: { entries: 7, dkv: { rawLength: 30, messagesLength: 30, kvLength: 30 } },
  rssMiB: 245
}
  • This confirms the fix we were after: client-side GC keeps the CoreStream arrays flat for keyed updates.

@haraldschilly haraldschilly added PR-work in progress PR-TODO-cocalc2 merge/migrate this PR into CoCalc2 in the future labels Jan 16, 2026
haraldschilly and others added 4 commits January 16, 2026 16:06
When a client is closed during cluster scan, client.options may be
undefined (deleted in Client.close()). Add defensive check to avoid
TypeError when accessing client.options.address.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR-TODO-cocalc2 merge/migrate this PR into CoCalc2 in the future PR-work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants