Inactive `DocHandle`s are not evicted from `handleCache`

Hi, wanted to share my view on the DocHandle eviction problem and hear your thoughts. If the suggested approach sounds reasonable, I can start working on it this or early next week.

The problem:
DocHandles are hard-referenced in `Repo#handleCache` and never destroyed. The longer the program runs the more "idle" documents there are. 

Constraints:
* We definitely don't want to unload a document if client code is working with it (holds a reference).
* We don't want to unload a document another peer is actively replicating to us, so that we don't need to load it back to memory on every new message.
* There might be >10k handles if objects are small and each reside in their own document, unlikely >100k (?). 

Suggested solution:
1. Rename `DocHandle` to  `InternalDocHandle`, all the private logic is going to stay here.
2. Add `DocHandle` that'll hold a hard-ref to `InternalDocHandle`. Public API goes here.
3. `InternalDocHandle` holds a weak-ref to `DocHandle` that hard-references it. We need this to avoid unloading documents referenced by client code.
4. Maintain a lastChange timestamp on `InternalDocHandle`. We need it to avoid unloading documents which are being replicated.
5. `setInterval` to scan all handles, removing those where `DocHandle` weak-ref is undefined and it's been more than a minute (?) since the last document update. Should probably be user-configurable.
6. Scan can be incremental, yielding after 10k items not to hold event-loop for too long, we're ok with eventual eviction. We can continue the scan from where we left on the next interval trigger or just schedule a macro-task to continue after other callbacks had a go. I'd probably go with the latter approach.

Alternatives:
1. Maintain an LRU-cache, moving handles in response to `change` events, when garbage-collecting we can iterate from least to most recently used and stop iteration as soon as time passed since `lastChange` doesn't exceed eviction threshold. Not a bad idea, but we don't gain much if user code holds references to a lot of objects which rarely change and sit in the tail. We'll need to go through all of them first. In addition with our data set size reference walking can be worse than linear array scan even if the latter checks more elements.
2. Don't introduce `InternalDocHandle`. Let's say we maintain an LRU cache as in alternative#1 and based on `lastChange` timestamp we move handles from strong-ref `active` cache to `inactive: Map<DocumentId, WeakMap<DocHandle>>` cache. On sync message or `find` we can look up in both and return an existing handle which was still in-use by user-code. On `change` we can bring handles back to `activeCache` if they weren't there. Don't like this solution because: 
   * Quite error-prone and unjustifiably complicated: when a handle is moved to `inactiveCache` we need to ensure that all internal references are cleaned-up. When we bring it back to active cache we need to ensure everything is in place, like there's an active docSynchronizer.
   * We'll still need to full-scan and periodically clean `inactiveCache` entries where `deref` resolves to undefined.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inactive `DocHandle`s are not evicted from `handleCache` #358

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inactive DocHandles are not evicted from handleCache #358

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Inactive `DocHandle`s are not evicted from `handleCache` #358