Open
Description
Hi, wanted to share my view on the DocHandle eviction problem and hear your thoughts. If the suggested approach sounds reasonable, I can start working on it this or early next week.
The problem:
DocHandles are hard-referenced in Repo#handleCache
and never destroyed. The longer the program runs the more "idle" documents there are.
Constraints:
- We definitely don't want to unload a document if client code is working with it (holds a reference).
- We don't want to unload a document another peer is actively replicating to us, so that we don't need to load it back to memory on every new message.
- There might be >10k handles if objects are small and each reside in their own document, unlikely >100k (?).
Suggested solution:
- Rename
DocHandle
toInternalDocHandle
, all the private logic is going to stay here. - Add
DocHandle
that'll hold a hard-ref toInternalDocHandle
. Public API goes here. InternalDocHandle
holds a weak-ref toDocHandle
that hard-references it. We need this to avoid unloading documents referenced by client code.- Maintain a lastChange timestamp on
InternalDocHandle
. We need it to avoid unloading documents which are being replicated. setInterval
to scan all handles, removing those whereDocHandle
weak-ref is undefined and it's been more than a minute (?) since the last document update. Should probably be user-configurable.- Scan can be incremental, yielding after 10k items not to hold event-loop for too long, we're ok with eventual eviction. We can continue the scan from where we left on the next interval trigger or just schedule a macro-task to continue after other callbacks had a go. I'd probably go with the latter approach.
Alternatives:
- Maintain an LRU-cache, moving handles in response to
change
events, when garbage-collecting we can iterate from least to most recently used and stop iteration as soon as time passed sincelastChange
doesn't exceed eviction threshold. Not a bad idea, but we don't gain much if user code holds references to a lot of objects which rarely change and sit in the tail. We'll need to go through all of them first. In addition with our data set size reference walking can be worse than linear array scan even if the latter checks more elements. - Don't introduce
InternalDocHandle
. Let's say we maintain an LRU cache as in alternative#1 and based onlastChange
timestamp we move handles from strong-refactive
cache toinactive: Map<DocumentId, WeakMap<DocHandle>>
cache. On sync message orfind
we can look up in both and return an existing handle which was still in-use by user-code. Onchange
we can bring handles back toactiveCache
if they weren't there. Don't like this solution because:- Quite error-prone and unjustifiably complicated: when a handle is moved to
inactiveCache
we need to ensure that all internal references are cleaned-up. When we bring it back to active cache we need to ensure everything is in place, like there's an active docSynchronizer. - We'll still need to full-scan and periodically clean
inactiveCache
entries wherederef
resolves to undefined.
- Quite error-prone and unjustifiably complicated: when a handle is moved to
Metadata
Metadata
Assignees
Labels
No labels
Activity