Backtest very slow with bar data and moderate order count (~26k orders, 264k bars) #3634

djdjoko · 2026-02-25T13:24:07Z

djdjoko
Feb 25, 2026

Environment

NautilusTrader: 1.223.0
Python: 3.12
OS: Windows 10
API: BacktestEngine low-level API, no streaming

Setup

Data: 2 instruments (BTCUSDT-PERP, ETHUSDT-PERP), 1-minute LAST bars from ParquetDataCatalog. ~132k bars per instrument, ~264k bars total over 3 months.
Loading: We use the recommended pattern: engine.add_data(bars, sort=False) per instrument, then engine.sort_data() once.
Venue: Single venue, NETTING, MARGIN, default fill model. No custom modules.
Strategies used for timing:
- NoOpStrategy: Subscribes to bars for both instruments, empty on_bar.
- StubStrategy: Same subscriptions, computes a simple spread, publishes a custom signal every bar, and submits a market order every 10 bars (~26,400 orders over the run). Log level ERROR for the "no orders" vs "many orders" comparison; we also tried WARNING to measure logging cost.

Observed run times (single run each, same dataset)

Scenario	`bar_execution`	Log level	Engine run time (approx)
NoOpStrategy (2 instruments)	`false`	ERROR	~9 s
NoOpStrategy (2 instruments)	`true`	ERROR	~32 s
StubStrategy, order every 10_000 bars (~26 orders)	`true`	ERROR	~41 s
StubStrategy, order every 10 bars (~26_400 orders)	`true`	ERROR	~267 s
Same as above	`true`	WARNING	~310 s

Summary:

Enabling bar_execution (OHLC expansion) adds ~23 s for 264k bars (~4× more matching iterations).
Going from ~26 orders to ~26,400 orders adds ~226 s, i.e. ~8.6 ms per order for the full order lifecycle (submit → fill → position/account updates).
WARNING logging adds ~43 s (we assume from formatting/writing many messages).

I am not using StreamingConfig during backtest; results are written once at the end from the engine cache.

Question

I am trying to understand whether this level of slowness is expected. An order every 10 minutes (our StubStrategy with order_every_n_bars=10 on two instruments) doesn't seem extreme for a backtester, but a ~4–6 minute run for 3 months of 1-min bar data with that order count makes iterative strategy development painful. I'd like to know:

Is ~8–9 ms per order lifecycle (submit → fill → position/account) in the ballpark of what you see in backtests, or are we missing a recommended configuration (e.g. risk bypass, different fill model, or data type)?
For strategies that need bar data and a non-trivial number of orders (e.g. hundreds to tens of thousands over the run), is there an official or recommended way to get faster backtests (e.g. avoid bar OHLC expansion when we only need bar close for fills, or batch processing)?

I have already applied the documented optimization (deferred sort with sort=False + sort_data()). I am not using the high-level BacktestNode/streaming path; we could try that if it's known to be faster for this kind of workload.

Thanks for any guidance.

Aliipou · 2026-03-24T12:10:44Z

Aliipou
Mar 24, 2026

A few things to check for a 26k-order / 264k-bar backtest that runs slow on Windows:

1. sort=False + single sort_data() — you already have this right

That is the correct pattern and avoids the O(n²) sorting problem. Good.

2. The most common Windows-specific bottleneck: Python GIL + Rust thread pool

On Windows, Rust's Rayon thread pool (used for data processing) can spawn many threads that contend. Try pinning the thread count:

import os
os.environ["RAYON_NUM_THREADS"] = "1"  # or match your physical core count

Set this before importing nautilus_trader.

3. Use TimeBarAggregator instead of LastBarAggregator if possible

LAST bars require tick-level matching under the hood. If your strategy only needs OHLCV timing, switching to TIME bar type can give a 2-4x speedup.

4. Disable logging during the run

import logging
logging.disable(logging.CRITICAL)

NautilusTrader's Rust → Python log bridge has overhead at high message volume.

5. Profile with a minimal NoOpStrategy first

If NoOpStrategy on 264k bars takes > 5s, the bottleneck is data loading/sorting, not strategy logic. If it's fast and FullStrategy is slow, the bottleneck is in your on_bar handler.

6. ParquetDataCatalog read performance

Make sure the parquet files are not fragmented into thousands of small files. Consolidate to one file per instrument per month for fastest reads.

What does the timing look like with NoOpStrategy vs your full strategy?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Backtest very slow with bar data and moderate order count (~26k orders, 264k bars) #3634

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Backtest very slow with bar data and moderate order count (~26k orders, 264k bars) #3634

Uh oh!

djdjoko Feb 25, 2026

Environment

Setup

Observed run times (single run each, same dataset)

Question

Replies: 1 comment

Uh oh!

Aliipou Mar 24, 2026

djdjoko
Feb 25, 2026

Aliipou
Mar 24, 2026