Skip to content

Lithops FunctionExecutor memory leaks: atexit handler + unbounded futures list #926

@jbusecke

Description

@jbusecke

Summary

Lithops' FunctionExecutor has two memory leak sources that affect VirtualiZarr when using the lithops parallel backend. We've fixed both in #925 (the executor-cleaning branch), but documenting the upstream issues here for reference.

Issue 1: Unbounded futures list (the main problem)

Every map() call on a FunctionExecutor appends ResponseFuture objects to self.futures. Each future caches its deserialized result in _call_output. Lithops never clears this list internally, so memory grows linearly with the number of operations — even when reusing a single executor.

This is the primary memory leak in typical VirtualiZarr usage (e.g., repeated open_virtual_dataset calls with a long-lived executor).

Fix: Clear _call_output on each future and clear the futures list during shutdown.

Issue 2: atexit handler prevents garbage collection

In FunctionExecutor.__init__(), lithops registers:

atexit.register(self.clean, clean_cloudobjects=False, clean_fn=True, on_exit=True)

self.clean is a bound method that holds a strong reference back to the executor. The atexit module keeps this reference alive for the entire process lifetime, so the FunctionExecutor can never be garbage collected — even after all user references are deleted.

This is by design in lithops: the atexit handler is a safety net to clean up cloud storage artifacts (cached functions, intermediate data) at process exit. It's harmless in their intended usage pattern (one executor per process), but causes a leak in VirtualiZarr's pattern of creating and discarding executors.

Fix: Call atexit.unregister(self.lithops_client.clean) during shutdown.

MRE for the atexit issue

import atexit
import gc
import weakref


class FakeFunctionExecutor:
    """Mimics lithops.executors.FunctionExecutor."""

    def __init__(self):
        # https://github.com/lithops-cloud/lithops/blob/3.6.3/lithops/executors.py#L111
        atexit.register(self.clean)

    def clean(self):
        pass


# --- Leak ---
executor = FakeFunctionExecutor()
ref = weakref.ref(executor)
del executor
gc.collect()
print(f"Object still alive: {ref() is not None}")  # True — leaked!

# --- Fix ---
executor2 = FakeFunctionExecutor()
ref2 = weakref.ref(executor2)
atexit.unregister(executor2.clean)
del executor2
gc.collect()
print(f"Object still alive: {ref2() is not None}")  # False — freed!

Consideration for cloud executors

The atexit.unregister call removes lithops' safety net for cleaning cloud storage artifacts. For localhost executors this is harmless, but for cloud function executors (AWS Lambda, IBM Cloud Functions), if a process crashes without calling shutdown(), temporary cloud artifacts could be orphaned. A potential improvement would be to replace the strong atexit reference with a weakref-based callback:

import weakref

def _weak_atexit_clean(weak_client):
    client = weak_client()
    if client is not None:
        client.clean(clean_cloudobjects=False, clean_fn=True, on_exit=True)

atexit.unregister(self.lithops_client.clean)
atexit.register(_weak_atexit_clean, weakref.ref(self.lithops_client))

This allows GC while still running cleanup if the executor is alive at exit.

Upstream

There are no existing lithops issues about either of these problems. The only related issue (lithops-cloud/lithops#1409) was about thread growth, not memory leaks. We may want to file an upstream issue as well.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions