Skip to content

Inactive DocHandles are not evicted from handleCache #358

Open
@yarolegovich

Description

Hi, wanted to share my view on the DocHandle eviction problem and hear your thoughts. If the suggested approach sounds reasonable, I can start working on it this or early next week.

The problem:
DocHandles are hard-referenced in Repo#handleCache and never destroyed. The longer the program runs the more "idle" documents there are.

Constraints:

  • We definitely don't want to unload a document if client code is working with it (holds a reference).
  • We don't want to unload a document another peer is actively replicating to us, so that we don't need to load it back to memory on every new message.
  • There might be >10k handles if objects are small and each reside in their own document, unlikely >100k (?).

Suggested solution:

  1. Rename DocHandle to InternalDocHandle, all the private logic is going to stay here.
  2. Add DocHandle that'll hold a hard-ref to InternalDocHandle. Public API goes here.
  3. InternalDocHandle holds a weak-ref to DocHandle that hard-references it. We need this to avoid unloading documents referenced by client code.
  4. Maintain a lastChange timestamp on InternalDocHandle. We need it to avoid unloading documents which are being replicated.
  5. setInterval to scan all handles, removing those where DocHandle weak-ref is undefined and it's been more than a minute (?) since the last document update. Should probably be user-configurable.
  6. Scan can be incremental, yielding after 10k items not to hold event-loop for too long, we're ok with eventual eviction. We can continue the scan from where we left on the next interval trigger or just schedule a macro-task to continue after other callbacks had a go. I'd probably go with the latter approach.

Alternatives:

  1. Maintain an LRU-cache, moving handles in response to change events, when garbage-collecting we can iterate from least to most recently used and stop iteration as soon as time passed since lastChange doesn't exceed eviction threshold. Not a bad idea, but we don't gain much if user code holds references to a lot of objects which rarely change and sit in the tail. We'll need to go through all of them first. In addition with our data set size reference walking can be worse than linear array scan even if the latter checks more elements.
  2. Don't introduce InternalDocHandle. Let's say we maintain an LRU cache as in alternative#1 and based on lastChange timestamp we move handles from strong-ref active cache to inactive: Map<DocumentId, WeakMap<DocHandle>> cache. On sync message or find we can look up in both and return an existing handle which was still in-use by user-code. On change we can bring handles back to activeCache if they weren't there. Don't like this solution because:
    • Quite error-prone and unjustifiably complicated: when a handle is moved to inactiveCache we need to ensure that all internal references are cleaned-up. When we bring it back to active cache we need to ensure everything is in place, like there's an active docSynchronizer.
    • We'll still need to full-scan and periodically clean inactiveCache entries where deref resolves to undefined.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions