Skip to content

Conversation

@frankie567
Copy link

Problem

While investigating a memory leak in polar, we found out that exceptions raised inside async actors lead to local references being held until worker shutdown.

When a coroutine raises an exception:

  1. The asyncio Future captures the exception with its full traceback
  2. The traceback holds references to all frame locals (database connections, HTTP clients, large data structures, etc.)
  3. These references are never cleared, causing memory to accumulate over time
Demo script showing the issue
import os
import time

import dramatiq
import psutil
from dramatiq.brokers.redis import RedisBroker
from dramatiq.middleware.asyncio import AsyncIO

broker = RedisBroker(url="redis://localhost:6379/0")
dramatiq.set_broker(broker)
broker.add_middleware(AsyncIO())


class BigException(Exception):
    def __init__(self, a: bytes) -> None:
        self.a = a
        super().__init__("Big exception")


MEMORY_LOG_FILE = "memory_usage.csv"


def log_memory(label: str = "") -> None:
    process = psutil.Process()
    memory_info = process.memory_info()
    timestamp = time.time()
    memory_mb = memory_info.rss / 1024 / 1024

    file_exists = os.path.exists(MEMORY_LOG_FILE)
    with open(MEMORY_LOG_FILE, "a") as f:
        if not file_exists:
            f.write("timestamp,memory_mb,label\n")
        f.write(f"{timestamp},{memory_mb:.2f},{label}\n")


@dramatiq.actor(actor_name="oom_task", max_retries=1_000_000, max_backoff=100)
async def oom_task() -> None:
    log_memory("before_alloc")
    a = bytes(bytearray(128 * 1024 * 1024))
    log_memory("after_alloc")
    raise BigException(a)


if __name__ == "__main__":
    oom_task.send()

Fix

Added an after_process_message hook to the AsyncIO middleware that clears exception.__traceback__. This runs after the worker logs the exception (so stack traces are preserved for debugging) but before references can accumulate.

frankie567 added a commit to polarsource/polar that referenced this pull request Dec 1, 2025
@LincolnPuzey
Copy link
Collaborator

LincolnPuzey commented Dec 1, 2025

Hi @frankie567, Thanks for the PR and detailed example.

I am worried that mutating the exception object is a band-aid fix and not solving the root issue. If the future is capturing the exception, is something holding a reference to the future? or is something somewhere else holding a reference to the exception?

Not that I expect you to know, but this is what we need to investigate

@frankie567
Copy link
Author

I am worried that mutating the exception object is a band-aid fix and not solving the root issue. If the future is capturing the exception, is something holding a reference to the future? or is something somewhere else holding a reference to the exception?

I agree, but I couldn't find the reason. The fix — even naive — has a dramatic (pun intended) positive impact on my small example and our backend, that's why I didn't go much further 😊 But happy to help in any way I can to help discover the root cause.

frankie567 added a commit to polarsource/polar that referenced this pull request Dec 1, 2025
@frankie567 frankie567 closed this Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants