Summary
Lithops' FunctionExecutor has two memory leak sources that affect VirtualiZarr when using the lithops parallel backend. We've fixed both in #925 (the executor-cleaning branch), but documenting the upstream issues here for reference.
Issue 1: Unbounded futures list (the main problem)
Every map() call on a FunctionExecutor appends ResponseFuture objects to self.futures. Each future caches its deserialized result in _call_output. Lithops never clears this list internally, so memory grows linearly with the number of operations — even when reusing a single executor.
This is the primary memory leak in typical VirtualiZarr usage (e.g., repeated open_virtual_dataset calls with a long-lived executor).
Fix: Clear _call_output on each future and clear the futures list during shutdown.
Issue 2: atexit handler prevents garbage collection
In FunctionExecutor.__init__(), lithops registers:
atexit.register(self.clean, clean_cloudobjects=False, clean_fn=True, on_exit=True)
self.clean is a bound method that holds a strong reference back to the executor. The atexit module keeps this reference alive for the entire process lifetime, so the FunctionExecutor can never be garbage collected — even after all user references are deleted.
This is by design in lithops: the atexit handler is a safety net to clean up cloud storage artifacts (cached functions, intermediate data) at process exit. It's harmless in their intended usage pattern (one executor per process), but causes a leak in VirtualiZarr's pattern of creating and discarding executors.
Fix: Call atexit.unregister(self.lithops_client.clean) during shutdown.
MRE for the atexit issue
import atexit
import gc
import weakref
class FakeFunctionExecutor:
"""Mimics lithops.executors.FunctionExecutor."""
def __init__(self):
# https://github.com/lithops-cloud/lithops/blob/3.6.3/lithops/executors.py#L111
atexit.register(self.clean)
def clean(self):
pass
# --- Leak ---
executor = FakeFunctionExecutor()
ref = weakref.ref(executor)
del executor
gc.collect()
print(f"Object still alive: {ref() is not None}") # True — leaked!
# --- Fix ---
executor2 = FakeFunctionExecutor()
ref2 = weakref.ref(executor2)
atexit.unregister(executor2.clean)
del executor2
gc.collect()
print(f"Object still alive: {ref2() is not None}") # False — freed!
Consideration for cloud executors
The atexit.unregister call removes lithops' safety net for cleaning cloud storage artifacts. For localhost executors this is harmless, but for cloud function executors (AWS Lambda, IBM Cloud Functions), if a process crashes without calling shutdown(), temporary cloud artifacts could be orphaned. A potential improvement would be to replace the strong atexit reference with a weakref-based callback:
import weakref
def _weak_atexit_clean(weak_client):
client = weak_client()
if client is not None:
client.clean(clean_cloudobjects=False, clean_fn=True, on_exit=True)
atexit.unregister(self.lithops_client.clean)
atexit.register(_weak_atexit_clean, weakref.ref(self.lithops_client))
This allows GC while still running cleanup if the executor is alive at exit.
Upstream
There are no existing lithops issues about either of these problems. The only related issue (lithops-cloud/lithops#1409) was about thread growth, not memory leaks. We may want to file an upstream issue as well.
Summary
Lithops'
FunctionExecutorhas two memory leak sources that affect VirtualiZarr when using the lithops parallel backend. We've fixed both in #925 (theexecutor-cleaningbranch), but documenting the upstream issues here for reference.Issue 1: Unbounded
futureslist (the main problem)Every
map()call on aFunctionExecutorappendsResponseFutureobjects toself.futures. Each future caches its deserialized result in_call_output. Lithops never clears this list internally, so memory grows linearly with the number of operations — even when reusing a single executor.This is the primary memory leak in typical VirtualiZarr usage (e.g., repeated
open_virtual_datasetcalls with a long-lived executor).Fix: Clear
_call_outputon each future and clear the futures list during shutdown.Issue 2:
atexithandler prevents garbage collectionIn
FunctionExecutor.__init__(), lithops registers:self.cleanis a bound method that holds a strong reference back to the executor. Theatexitmodule keeps this reference alive for the entire process lifetime, so theFunctionExecutorcan never be garbage collected — even after all user references are deleted.This is by design in lithops: the atexit handler is a safety net to clean up cloud storage artifacts (cached functions, intermediate data) at process exit. It's harmless in their intended usage pattern (one executor per process), but causes a leak in VirtualiZarr's pattern of creating and discarding executors.
Fix: Call
atexit.unregister(self.lithops_client.clean)during shutdown.MRE for the atexit issue
Consideration for cloud executors
The
atexit.unregistercall removes lithops' safety net for cleaning cloud storage artifacts. For localhost executors this is harmless, but for cloud function executors (AWS Lambda, IBM Cloud Functions), if a process crashes without callingshutdown(), temporary cloud artifacts could be orphaned. A potential improvement would be to replace the strong atexit reference with a weakref-based callback:This allows GC while still running cleanup if the executor is alive at exit.
Upstream
There are no existing lithops issues about either of these problems. The only related issue (lithops-cloud/lithops#1409) was about thread growth, not memory leaks. We may want to file an upstream issue as well.