-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Description
Memory grows linearly (~7 MB/min) during market hours when receiving Databento live data, stabilizing when data stops. The primary allocation source is capsule_to_data in the Databento adapter, accumulating ~32-35 MB every 5 minutes.
Environment
- NautilusTrader: develop branch
- Python: 3.13
- Adapter: Databento CBBO-1S schema (~500 subscribed instruments)
- Platform: Linux (Fly.io), 2GB RAM
Profiling Data
Three snapshots taken 10 minutes apart during market hours:
| Metric | 18:36 | 18:46 | 18:56 |
|---|---|---|---|
| Traced memory | 129.7 MB | 201.4 MB | 269.4 MB |
capsule_to_data delta |
- | +35.3 MB | +32.0 MB |
data_engine.py:482 delta |
- | +1.6 MB | +1.5 MB |
| tuple count | 14,836 | 1,208,277 | 1,590,831 |
Memory changes consistently show:
+32039.1 KB: nautilus_trader/adapters/databento/data.py:1642
+1460.4 KB: nautilus_trader/live/data_engine.py:482
Line 1642 is data = capsule_to_data(pycapsule) in _handle_msg().
Expected Behavior
Memory should plateau once caches fill. The quote tick cache uses deque(maxlen=tick_capacity) which should bound storage. Instead, memory grows linearly proportional to data volume.
Data Flow Analysis
- Rust
DatabentoFeedHandlerreceives records, createsLiveMessage::Data(data) - Puts into mpsc channel (buffer_size=100,000)
process_messagesasync loop receives, callsdata_to_pycapsule(py, data)- Python callback
_handle_msg(pycapsule)is invoked capsule_to_data(pycapsule)copies data into Cython QuoteTick- QuoteTick goes to bounded cache and msgbus
- Capsule should be freed - but something retains it
Potential Issues
1. LiveMessage enum size
#[allow(clippy::large_enum_variant, reason = "TODO: Optimize this (largest variant 1096 vs 80 bytes)")]
pub enum LiveMessage {
Data(Data),
Instrument(InstrumentAny), // 1096 bytes
...
}With 100k buffer: 100,000 × 1,096 = ~110 MB Rust memory overhead.
2. Capsule lifecycle unclear
In model/src/python/data/mod.rs:91-96:
pub fn data_to_pycapsule(py: Python, data: Data) -> Py<PyAny> {
let capsule = PyCapsule::new_with_destructor(py, data, None, |_, _| {})
.expect("Error creating `PyCapsule`");
capsule.into_any().unbind()
}The destructor is a no-op. The Box should drop Data when Python deallocates the capsule, but is the capsule being deallocated promptly?
3. Async callback reference holding
The process_messages async function runs in tokio, calling Python callbacks. Could the async runtime or pyo3 callback mechanism hold references longer than expected?
Reproduction Steps
- Configure Databento adapter with ~500 instrument subscriptions (CBBO-1S)
- Run live node during market hours
- Enable tracemalloc profiling
- Observe linear memory growth at
capsule_to_dataline
Additional Context
- Tuple count growth slows over time (1.19M first 10min, 382K second 10min) suggesting some caching stabilizes
- Memory growth remains linear (~7 MB/min) - no plateau
- Growth stops when market data stops (off-hours)
- AccountState objects also accumulating, but not nearly enough to explain the leak (separate issue - missing purge config)