Don't use pure managed memory for the target threshold

# Current design

1. Every time a new key is inserted in Worker.data, if the *managed* memory (output of sizeof) exceeds the `target` threshold, keys are spilled from the bottom of the LRU cache until the managed memory goes below target.
This is a synchronous process that does not release the event loop. This isn't great, but it's bounded in the sense that it's never going to spill more bytes than the size of the key that has just been inserted.

2. Every 100ms (`distributed.worker.memory.monitor-interval`), measure the *process* memory through psutil. If the process memory exceeds the `spill` threshold, start spilling keys until the *process* memory goes below the `target` threshold (hysteresis cycle). Re-measure process memory, call garbage collection, and release the event loop multiple times in this process, which can potentially take many seconds.

The intent of this design is to have a very responsive, cheap, but inaccurate first threshold and a slow-to-notice, expensive, but accurate second one. The design however is problematic:

1. when unmanaged memory (process - managed) is very high, e.g. due to a leak, high heap from the running user functions, or underestimated output of sizeof(). In the extreme cases of memory leaking, you're going to reach the `spill` threshold without having ever hit the target threshold and then spill the whole contents of Worker.data all at once.
2. when unmanaged memory is negative, due to overestimated output of sizeof(). This will cause `target` to start spilling too soon, when there's plenty of memory still available.

# Proposed design

In zict:
- Add an `offset` property to `zict.LRU`. This property is added to `total_weights` for the purpose of eviction.

In distributed.worker_memory:
- Every 100ms, measure process memory and calculate unmanaged memory.
- If process memory is above the `spill` threshold and there is data in `Worker.fast`, garbage collect and re-measure it.
- Update `Worker.data.fast.offset` to the amount of unmanaged memory.
- Manually trigger spilling in zict.

In `distributed.worker_state_machine._transition_to_memory`, `distributed.Worker.execute`, and `distributed.Worker.get_data`: no change, but now the offset is considered every time a key is inserted in fast.

# Notes
- This could cause zict to synchronously spill many GiBs at once, without ever releasing the event loop. This change should be paired with #4424.
- Leaving the current thresholds unchanged, you'll start spilling a lot earlier. Effectively, target is the new spill. I think it's safe to bump both by 0.1 (making spill the same as pause)
- We should rename "spill" to "aggressive_gc" to clarify its new meaning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Don't use pure managed memory for the target threshold #7421

Current design

Proposed design

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Don't use pure managed memory for the target threshold #7421

Description

Current design

Proposed design

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions