Skip to content

[Databento] Memory leak in live data path - linear growth at capsule_to_data #3485

@shzhng

Description

@shzhng

Description

Memory grows linearly (~7 MB/min) during market hours when receiving Databento live data, stabilizing when data stops. The primary allocation source is capsule_to_data in the Databento adapter, accumulating ~32-35 MB every 5 minutes.

Environment

  • NautilusTrader: develop branch
  • Python: 3.13
  • Adapter: Databento CBBO-1S schema (~500 subscribed instruments)
  • Platform: Linux (Fly.io), 2GB RAM

Profiling Data

Three snapshots taken 10 minutes apart during market hours:

Metric 18:36 18:46 18:56
Traced memory 129.7 MB 201.4 MB 269.4 MB
capsule_to_data delta - +35.3 MB +32.0 MB
data_engine.py:482 delta - +1.6 MB +1.5 MB
tuple count 14,836 1,208,277 1,590,831

Memory changes consistently show:

+32039.1 KB: nautilus_trader/adapters/databento/data.py:1642
+1460.4 KB: nautilus_trader/live/data_engine.py:482

Line 1642 is data = capsule_to_data(pycapsule) in _handle_msg().

Expected Behavior

Memory should plateau once caches fill. The quote tick cache uses deque(maxlen=tick_capacity) which should bound storage. Instead, memory grows linearly proportional to data volume.

Data Flow Analysis

  1. Rust DatabentoFeedHandler receives records, creates LiveMessage::Data(data)
  2. Puts into mpsc channel (buffer_size=100,000)
  3. process_messages async loop receives, calls data_to_pycapsule(py, data)
  4. Python callback _handle_msg(pycapsule) is invoked
  5. capsule_to_data(pycapsule) copies data into Cython QuoteTick
  6. QuoteTick goes to bounded cache and msgbus
  7. Capsule should be freed - but something retains it

Potential Issues

1. LiveMessage enum size

#[allow(clippy::large_enum_variant, reason = "TODO: Optimize this (largest variant 1096 vs 80 bytes)")]
pub enum LiveMessage {
    Data(Data),
    Instrument(InstrumentAny),  // 1096 bytes
    ...
}

With 100k buffer: 100,000 × 1,096 = ~110 MB Rust memory overhead.

2. Capsule lifecycle unclear

In model/src/python/data/mod.rs:91-96:

pub fn data_to_pycapsule(py: Python, data: Data) -> Py<PyAny> {
    let capsule = PyCapsule::new_with_destructor(py, data, None, |_, _| {})
        .expect("Error creating `PyCapsule`");
    capsule.into_any().unbind()
}

The destructor is a no-op. The Box should drop Data when Python deallocates the capsule, but is the capsule being deallocated promptly?

3. Async callback reference holding

The process_messages async function runs in tokio, calling Python callbacks. Could the async runtime or pyo3 callback mechanism hold references longer than expected?

Reproduction Steps

  1. Configure Databento adapter with ~500 instrument subscriptions (CBBO-1S)
  2. Run live node during market hours
  3. Enable tracemalloc profiling
  4. Observe linear memory growth at capsule_to_data line

Additional Context

  • Tuple count growth slows over time (1.19M first 10min, 382K second 10min) suggesting some caching stabilizes
  • Memory growth remains linear (~7 MB/min) - no plateau
  • Growth stops when market data stops (off-hours)
  • AccountState objects also accumulating, but not nearly enough to explain the leak (separate issue - missing purge config)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions