A reinforcement learning framework for optimizing cryptocurrency portfolios using Recurrent Proximal Policy Optimization (RPPO). This project implements a deep RL agent that learns to allocate capital across multiple cryptocurrencies using historical market data.
This thesis project explores the application of recurrent reinforcement learning to cryptocurrency portfolio optimization. The agent learns optimal trading strategies by interacting with a simulated trading environment, using LSTM networks to capture temporal dependencies in market data.
- Multi-Asset Portfolio Management: Simultaneously trades across 9 cryptocurrencies (BTC, ETH, ADA, LTC, VET, XRP, AVAX, DOT, SOL)
- Recurrent Architecture: Uses LSTM layers to maintain memory of market dynamics
- Multi-Timeframe Features: Processes market data from multiple time intervals
- Custom Feature Extractors: Implements DAIN (Deep Adaptive Input Normalization) and CNN-based feature extraction
- Realistic Trading Simulation: Includes transaction fees, portfolio constraints, and realistic market conditions
- Comprehensive Evaluation: Compares performance against buy-and-hold baselines using Sharpe and Sortino ratios
pip install -r requirements.txttorch- PyTorch for neural network operationsstable-baselines3- RL algorithms implementationsb3_contrib- Additional RL algorithms including RecurrentPPOpandas- Data manipulationnumpy- Numerical computationsmatplotlib&seaborn- Visualizationpyarrow- Feather file format supporttensorboard- Training monitoring
The project uses Binance historical data in Feather format. To download data:
- Update
binance_dump.pywith your desired date range and trading pairs - Run the script to download data:
python binance_dump.pyData will be saved in binance_data/all/ directory.
Expected data structure:
- Files:
{COIN}{FIAT}.feather(e.g.,btcusdt.feather) - Columns:
date,open,high,low,close,volume
Edit config.json to customize training parameters:
{
"agent": "RecurrentPPO",
"checkpoint_timesteps": 100000,
"total_timesteps": 10000000,
"coins": ["BTC", "ETH", "ADA", "LTC", "VET", "XRP", "AVAX", "DOT", "SOL"],
"intervals": ["4H"],
"train_start": "1-1-2021",
"train_end": "1-8-2022",
"env_kwargs": {
"capital": 1000,
"episode_length": 1024,
"fee": 0.00022
},
"agent_kwargs": {
"learning_rate": 0.0003,
"batch_size": 512,
"n_epochs": 7,
"clip_range": 0.1
}
}Train a new model:
python train.pyThe training script will:
- Load and preprocess cryptocurrency data
- Create a new model with unique ID
- Train the agent in episodes
- Save checkpoints periodically
- Log metrics to TensorBoard
To continue training from a checkpoint, set "continue": true in config.json.
Test a trained model:
python test_model.pyThis will:
- Load a trained model
- Evaluate on test data
- Generate performance visualizations
- Compare against buy-and-hold baseline
Analyze training results:
python visualize.pyGenerates visualizations for:
- Portfolio allocation correlation
- Price correlation
- Trade rate over episodes
- Sharpe and Sortino ratios
- Model vs buy-and-hold performance
.
├── train.py # Main training script
├── test_model.py # Model evaluation
├── visualize.py # Performance visualization
├── config.py # Configuration utilities
├── config.json # Training configuration
├── envs.py # Trading environment (Gym)
├── feature_extractors.py # Custom neural network architectures
├── custom_layers.py # DAIN normalization layers
├── loading_utils.py # Data loading utilities
├── preprocessing_utils.py # Feature engineering
├── callbacks.py # TensorBoard callbacks
├── binance_dump.py # Data download script
├── create_dataset.py # Dataset creation utilities
└── requirements.txt # Python dependencies
The agent observes:
- Multi-timeframe features: Price percentage changes, volume, volatility for each interval
- Portfolio allocation: Current distribution of capital across assets
- Reward: Previous step's reward signal
Continuous actions:
- Portfolio allocation (softmax): Distribution of capital across cryptocurrencies + USDT
- Trade threshold: Binary decision on whether to execute trades (threshold >= 0.5)
Reward is calculated as the percentage change in portfolio value normalized by initial capital:
reward = (portfolio_value_t - portfolio_value_{t-1}) / initial_capital
This encourages the agent to maximize returns while penalizing losses.
- AdaptiveNormalizationExtractor: Uses DAIN layers for adaptive normalization
- DainLstmExtractor: Combines DAIN normalization with LSTM processing
- CNNFeaturesExtractor: Convolutional neural network for feature extraction
- Base: MultiInputLstmPolicy from stable-baselines3
- Architecture: Configurable MLP layers (default: 3 layers of 64 units)
- LSTM: 2 layers with 64 hidden units
- Activation: ReLU or Tanh (configurable)
Training metrics are logged to TensorBoard:
tensorboard --logdir tb_logsMonitored metrics include:
- Episode returns
- Trade rate
- Portfolio value changes
- Portfolio allocations
- Loss functions
Model checkpoints and logs are saved in:
models/{agent_id}/- Trained model checkpointsmodel_logs/{agent_id}/- Episode logs and trade recordslogs/{agent_id}.config.json- Configuration snapshottb_logs/- TensorBoard logs
If you use this code in your research, please cite:
Cryptocurrency Portfolio Optimization with Recurrent Proximal Policy Optimization
[Your Name]
[Your Institution]
[Year]
[Specify your license here]
- Built on stable-baselines3
- Uses DAIN (Deep Adaptive Input Normalization) for feature normalization
- Data from Binance cryptocurrency exchange