Replies: 1 comment
-
|
A few things to check for a 26k-order / 264k-bar backtest that runs slow on Windows: 1. That is the correct pattern and avoids the O(n²) sorting problem. Good. 2. The most common Windows-specific bottleneck: Python GIL + Rust thread pool On Windows, Rust's Rayon thread pool (used for data processing) can spawn many threads that contend. Try pinning the thread count: import os
os.environ["RAYON_NUM_THREADS"] = "1" # or match your physical core countSet this before importing nautilus_trader. 3. Use
4. Disable logging during the run import logging
logging.disable(logging.CRITICAL)NautilusTrader's Rust → Python log bridge has overhead at high message volume. 5. Profile with a minimal NoOpStrategy first If NoOpStrategy on 264k bars takes > 5s, the bottleneck is data loading/sorting, not strategy logic. If it's fast and FullStrategy is slow, the bottleneck is in your 6. ParquetDataCatalog read performance Make sure the parquet files are not fragmented into thousands of small files. Consolidate to one file per instrument per month for fastest reads. What does the timing look like with NoOpStrategy vs your full strategy? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Environment
Setup
engine.add_data(bars, sort=False)per instrument, thenengine.sort_data()once.on_bar.Observed run times (single run each, same dataset)
bar_executionfalsetruetruetruetrueSummary:
bar_execution(OHLC expansion) adds ~23 s for 264k bars (~4× more matching iterations).I am not using StreamingConfig during backtest; results are written once at the end from the engine cache.
Question
I am trying to understand whether this level of slowness is expected. An order every 10 minutes (our StubStrategy with
order_every_n_bars=10on two instruments) doesn't seem extreme for a backtester, but a ~4–6 minute run for 3 months of 1-min bar data with that order count makes iterative strategy development painful. I'd like to know:I have already applied the documented optimization (deferred sort with
sort=False+sort_data()). I am not using the high-level BacktestNode/streaming path; we could try that if it's known to be faster for this kind of workload.Thanks for any guidance.
Beta Was this translation helpful? Give feedback.
All reactions