Skip to content

Conversation

@wxianxin
Copy link

@wxianxin wxianxin commented Jan 30, 2026

Changed default value of optimize_file_loading to False.

Pull Request

NautilusTrader prioritizes correctness and reliability, please follow existing patterns for validation and testing.

  • I have reviewed the CONTRIBUTING.md and followed the established practices

Summary

In backtesting, if the backtest data is large enough, setting optimize_file_loading to True will use up all the system memory (and crash the backtest for me). The NT will try to load all the market data into memory before start streaming.
In my case I am doing backtest using Tardis tick and orderbook data for a full month.

Currently there is no way to turn optimize_file_loading off via any config/arg.

Type of change

  • Bug fix (non-breaking)
  • New feature (non-breaking)
  • Improvement (non-breaking)
  • Breaking change (impacts existing behavior)
  • Documentation update
  • Maintenance / chore

Release notes

  • I added a concise entry to RELEASES.md that follows the existing conventions (when applicable)

Testing

Ensure new or changed logic is covered by tests.

  • Affected code paths are already covered by the test suite
  • I added/updated tests to cover new or changed logic

Changed default value of optimize_file_loading to False.
@CLAassistant
Copy link

CLAassistant commented Jan 30, 2026

CLA assistant check
All committers have signed the CLA.

Fixed OOM error in backtest by changing optimize_file_loading default value.
@faysou
Copy link
Collaborator

faysou commented Jan 30, 2026

Which BacktestRunConfig do you use ?

@wxianxin
Copy link
Author

from nautilus_trader.config import BacktestRunConfig
Is there a different BacktestRunConfig?

Which BacktestRunConfig do you use ?

@faysou
Copy link
Collaborator

faysou commented Jan 30, 2026

I mean BacktestDataConfig

There's no reason to load everything unless that's how you specifify it.

@wxianxin
Copy link
Author

wxianxin commented Jan 30, 2026

I mean BacktestDataConfig

There's no reason to load everything unless that's how you specifify it.

That's why I think it's a bug. I think it's preallocating the memory?

Here is how I use BacktestDataConfig:

    main_delta_config = BacktestDataConfig(
        catalog_path=str(catalog.path),
        data_cls=OrderBookDeltas,
        instrument_id=main_inst_id,
        start_time=start,
        end_time=end,
    )
    main_quotes_config = BacktestDataConfig(
        catalog_path=str(catalog.path),
        data_cls=QuoteTick,
        instrument_id=main_inst_id,
        start_time=start,
        end_time=end,
    )
    main_trades_config = BacktestDataConfig(
        catalog_path=str(catalog.path),
        data_cls=TradeTick,
        instrument_id=main_inst_id,
        start_time=start,
        end_time=end,
    )
    
    data_configs = [main_trades_config, main_delta_config, main_quotes_config]
    
    config = BacktestRunConfig(
        engine=BacktestEngineConfig(
            strategies=strategies,
            logging=LoggingConfig(bypass_logging=bypass_logging, log_level=log_level, log_level_file=log_level),
            cache=CacheConfig(
                tick_capacity=100,  # Store last 1000 ticks per instrument
                bar_capacity=10,  # Store last 300 bars per bar type
            ),
        ),
        data=data_configs,
        venues=venues_configs,
        raise_exception=raise_exception,
        chunk_size=5_000_000
    )

@faysou
Copy link
Collaborator

faysou commented Jan 30, 2026

Only directories for the the data you are interested in are loaded, and it's lazy loading from datafusion. If there's a bug the bug should be fixed instead of disabling the feature. Someone else introduced this loading of directories and it seemed to work from his tests.

@cjdsellers
Copy link
Member

Hi @wxianxin

Thanks for the report and feedback.

I agree with @faysou that the underlying behavior needs further investigation. The OOM you experienced suggests the loading isn't as lazy as we might expect, something in the pipeline could be materializing the data eagerly when directories are registered.

That said, I think that changing the optimize_file_loading default from True to False is the right call (it's not disabling the feature, just changing the default). The True setting is only beneficial when the catalog is organized with many small files and the user intends to read entire directories upfront (and knows this is occurring under the hood) - potentially a narrower and more advanced use case that shouldn't be the default?

@xxxsteve
Copy link

Only directories for the the data you are interested in are loaded, and it's lazy loading from datafusion. If there's a bug the bug should be fixed instead of disabling the feature. Someone else introduced this loading of directories and it seemed to work from his tests.

Full respect as I have limited exposure to Nautilus Trader. The new feature is great. It's just the default value caused a bug on a previous normal case, and as long as you are doing some large data backtesting, it will hit OOM and hard to debug. Changing the default value would be the safe option IMO. And the feature is still kept. In current setting, although it appears to be an changeable argument, it's essentially hard-coded settings and the only way to turn it off is to modify the source code.

Feel free to close this PR if it's safe to ignore this bug.

@cjdsellers
Copy link
Member

Only directories for the the data you are interested in are loaded, and it's lazy loading from datafusion. If there's a bug the bug should be fixed instead of disabling the feature. Someone else introduced this loading of directories and it seemed to work from his tests.

Full respect as I have limited exposure to Nautilus Trader. The new feature is great. It's just the default value caused a bug on a previous normal case, and as long as you are doing some large data backtesting, it will hit OOM and hard to debug. Changing the default value would be the safe option IMO. And the feature is still kept. In current setting, although it appears to be an changeable argument, it's essentially hard-coded settings and the only way to turn it off is to modify the source code.

Feel free to close this PR if it's safe to ignore this bug.

Thanks @xxxsteve. That was also another potential follow-up, to expose this through a config option.

@faysou
Copy link
Collaborator

faysou commented Jan 31, 2026

I'll do a pr to expose the parameter as config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants