A machine learning project for predicting extreme events in Apple stock prices using Random Forest and Temporal CNN models.
- Python 3.10 or higher
- Poetry (Python package manager)
-
Install Poetry (if not already installed)
-
Clone the repository
git clone https://github.com/mchaniotakis/apple_stock_ml
cd apple_stock_ml
- Install dependencies using Poetry
poetry lock
poetry install
src/
├── data_processing.py # Data loading and preprocessing functions
├── random_forest.py # Random Forest model implementation
├── temporal_cnn.py # Temporal CNN model implementation
├── model_evaluation.py # Model evaluation and metrics calculation
└── improvement.py # Model improvements and enhancements
- You dont have to specifically download any data, all models use data_processing to import the data requested.
- For every model you can specify the
--ticker stock_nameoption to use that stock for training. Every stock is cashed so that you dont have to download it twice. - You can download multiple stocks using
SP500for the S&P500 stocks, or if you would like the top X stocks, you can use this ticker :SP500TOP10which will download the top 10 stocks and use them for training. - Make sure to change the
--start-date--end-dateif you need to use different dates. - Use
--thresholdto specify the threshold for the extreme event point, currently at 2.0 --model-outputspecifies the output location of the model saved after its trained, currently at models/
- Once the models are trained, they would all be located at models/.. at that point feel free to run the evaluation script to compare their performance. Please take a look at
evaluation_results - Run with
python apple_stock_ml/src/model_validation.py --ticker AAPL --rf_idx_to_keep [-1]which makes sure that the RF model is only using the daily return feature. - The performance metrics for both models are stored in the
evaluation_resultsdirectory, including: - Confusion Matrix - Accuracy - Precision, Recall, F1-Score - ROC curves
Run the Random Forest model with default settings:
python apple_stock_ml/src/random_forest.py --ticker AAPL
Optional flags:
--use-smote: Enable SMOTE for handling class imbalance
-o: Enable optimization mode for hyperparameter tuning
- run the Temporal CNN model with the default settings:
python apple_stock_ml/src/temporal_cnn.py --ticker AAPL --learning-rate 0.1 --epochs 300 --patience 15 --batch-size 256
- Optional flags:
--pca: Enable PCA dimensionality reduction
--sequence-length: Set custom sequence length (default: 10)
--augs: Enable data augmentation
--use-smote: Enable SMOTE for handling class imbalance
- Run the improved model with the default settings:
python apple_stock_ml/src/improvement.py --ticker AAPL --sequence-length 10 --learning-rate 0.01 --batch-size 256 --patience 15
- Optional flags:
--pca: Enable PCA dimensionality reduction
--sequence-length: Set custom sequence length (default: 10)
--augs: Enable data augmentation
--use-smote: Enable SMOTE for handling class imbalance
1.Run the model validation (we are using only the dialy return for RF):
python apple_stock_ml/src/model_validation.py --ticker AAPL --rf_idx_to_keep [-1]
The validation script will:
- Load the trained models (from models/...pt)
- Generate performance metrics
- Create confusion matrices
- Compare model performances
- Detailed analysis and results can be found in
report.pdf - Model architecture details and hyperparameters are documented in the respective source files
- Performance comparisons and improvement strategies are detailed in the report
- All models will save their outputs in the appropriate directories under the project folder
- Results and metrics will be stored in the evaluation_results directory
- Models can be run with different stock tickers by changing the --ticker parameter
- Use the --help flag with any script to see all available options
- For more details about the implementation and metrics, refer to the project documentation and the original assignment requirements.