A Deep Reinforcement Learning framework for trading & order execution in financial markets. The goal of the project is to speed up the development & research of financial agents by building a modular and scalable codebase. The framework supports the following main features:
- Loading & preprocessing data directly from different APIs
- Training & evaluating deep reinforcement learning agents
- Use specific financial metrics or quickly implement your own
- Visualizing the performance of the agent with some intuitive graphs
The nice part is that everything is configurable within a config file.
The code is using popular packages like:
- pytorch
- pandas
- stable-baselines3
- gym
- wandb
- mplfinance
The architecture is split into 4 main categories:
- Data
- Environment
- Reinforcement Learning Agents
- Specific Task Layer
The Specific Task Layer is a glue code module that is used for training & backtesting.
It can be further be extended into the applicaton layer.
Visual representations of the actions taken by the agent & the current status of the agent:

- Code tested under
Python 3.8,pytorch 1.13.0, andcuda 11.6onUbuntu 20.04
- Create a conda environment and activate it:
conda create --name yacht python=3.8
conda env activate yacht- Install torch and cuda separately with
conda:
conda install pytorch=1.13.0 torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia- Ultimately, install other requirements with
pip:
pip install -r requirements.txt- The configuration system is built with
google protobuf. If you want to recompile / change the protobuf files, you should install theprotoccompiler on your system:
sudo apt install protobuf-compiler libprotobuf-dev- Run the
compilationcommand from the root folder:
protoc -I=. --python_out=. yacht/config/proto/*.proto- Create a file called
.envat the root directory level. If you want to fully use the market APIs and experiment trackers you should add the secret keys. - Look at
.env.defaultfor the supported env vars. - Not all
env varsare mandatory. For example the free version ofYahoo Financedoes not require any credentials.
- Currently, we have support for:
BinanceYahoo Finance.
- You should set the
api keysin the.envfile for full support.
- S&P 500
- Dow 30
- Nasdaq 100
- Russell 2000
You can set tickers: ['NASDAQ100'] in the configuration file and all the tickers from the index will be expanded.
You can also set something like ['NASDAQ100', 'S&P500', 'AAPL'] or any combination you like.
The data is stored in h5 files.
python main.py download --config-file-name download_4years.config.txt --storage-dir ./storage/download_4_years --market-storage-dir ./storage- The
--market-storage-dirCLI argument is optional. If you add it the market will be placed in a different location than yourstorage-dir. This is helpful because it can be accessed by multiple experiments in parallelduring training(theh5file will be set in a read only mode). Otherwise, while training, only one experiment can access a specific file. --market-storage-dirshould be used also duringtraining&backtesting- You can use the
market_mixins: [...]from the config file to preprocess the data before it is stored.
All the supported configs can be found at ./yacht/config/configs.
You should only add the config path relative to the root directory.
python main.py train --config-file order_execution/all/single_asset_all_universal_silent.config.txt --storage-dir ./storage/yacht --market-storage-dir ./storagepython main.py backtest --config-file order_execution/all/single_asset_all_universal_silent.config.txt --storage-dir ./storage/yacht --market-storage-dir ./storageYou can download the pretrained weights from here.
cd /root/directory
mkdir storage
--> place the downloaded weights in ./storage Suppose you downloaded and placed the pretrained weights and data correctly as showed above. In that case, you can run the following command to resume the agent:
python main.py train --config-file order_execution/all/single_asset_all_universal_silent.config.txt --storage-dir ./storage/yacht --resume-from best-train --market-storage-dir ./storageFor the parameter --resume-from we support the following combinations:
- Absolute path to the checkpoint.
latest-train= resume the agent from the latest checkpoint saved during trainingbest-train= resume the agent from the best checkpoint saved during training
NOTE: For the best-train parameter, you can choose a metric on which the agent was monitored. You
can do that with the meta.metrics_to_load_best_on parameter from the configuration file. For example, metrics_to_load_best_on: ['PA', 'GLR']
will load two agents: The agent monitored on the metric PA & the one who performed the best on GLR.
- For now, we support
wandbfor experiment tracking and logging. - Just add the api key in the
.envfile. Also, in the configuration file you should add:
meta: {
experiment_tracker: 'wandb'
}If you want to add a specific project_entity add:
meta: {
project_entity: 'your_project_entity'
} NOTE: Be aware that this name is unique between all the users that use wandb. For
example, I use project_entity=yacht. If you try to use it will through an unauthorized error
because you do not have access to my entity.
Here is an example of how it looks:

- If you don't want to log a specific experiment on the experiment tracker just remove the config
field or replace it with the empty string
''.
- We support yyperparameter optimization with weights & biases sweeps.
- Weights & biases should work as a simple experiment tracker before using this.
- You can use any other config from
tools/tuning/configsor generate your own.
wandb sweep tools/tuning/configs/single_asset_order_execution_crypto.yaml
wandb agent id-given-by-generated-sweepFor further reading go to: