Skip to content

Dkaattae/Monthly-Stock-Return-Prediction

Repository files navigation

Monthly-Stock-Return-Prediction

Using ML model to predict monthly return in the beginning of the month.

Project Overview

flowchart

Description

Problem:
stock return prediction using machine learning model is widely used.
usually, researchers run models in python notebook with conda environment.
it works fine until the data source getting bigger, and the python notebook grows to 10MB.

How MLOps helps:
That is when to bring in MLOps, setup model training pipeline, register models clearly,
with proper prediction drift monitoring, and model deployed in cloud.
That way, researchers works easier without messy and big python notebook.
But they can still work in python notebook.

Data Sources:
YFinance for stock price, company sector, index price and vix
Fred for treasury

Model Features:

Sector: Company Sector

Month Index: the number index of month from the first month in traning dataset. e.g. the first month of my training set is 2023/07/01. the month_index of 2023/08/01 is 1

Index Average: past year daily S&P500 index average

Alpha: the constant parameter where stock price regress on SPX, from past year daily price

Beta: the variable parameter where stock price regress on SPX, from past year daily price

Historical Volatility: standard variation from past year daily price. implied volatility is actually a better choice, but harder to get

End Of Month 10 Year Treasury

Monthly Average of 10 year Treasury

Spread: EndofMonth 10 Year Treasury - EndofMonth 2 Year Treasury

VIX Average: past month daily VIX average

Technology Stack

Python
MLFlow
AWS EC2, RDS, S3
Prefect
Evidently
Streamlit

Modules

Model Training

see model_training folder
training machine learning models and register into mlflow registry.
pipeline includes download data, transform data, prepare data, hyperopt training and registering.
prefect orchestrate all tasks

Model Prediction

see model_prediction folder
predict function, fetch models from mlflow server if alive, else fetch from S3 artifact store
it loads models and artifacts, using input data to return prediction.
the predict function then wrapped with flask app, and then put into docker image

Monitoring

see monitoring folder
it spin up a postgres database, adminer, grafana and a dataloader.
data loader is a python script wrapped in docker to load data into postgres database.
grafana is a dashboard to see data quality and prediction drift.

Testing

see model_prediction/tests folder
in model_prediction/run.sh
1, export environment variables,
2, build the flask app image if not built,
3, spin up localstack to mock s3,
4, build test container
5, configure localstack
6, run test container for unit tests
7, run flask app container
8, run test container for integration test
9, clean up

Deploy

i deployed to the flask app container to azure

curl -X POST http://predict-app.eastus.azurecontainer.io:8080/predict \
 -H "Content-Type: application/json" \
 -d @json_records.json

json_records.json is the feature example, it includes hundreds of records.
feel free to copy only one of them and passed into flask app.
change the last line to --data "$(jq '.[0]' json_records.json)" if you have jq installed.
feel free to adjust the index number.

Simulation

use streamlit to serve P&L chart.

Ports Overview

Service Port Environment Notes
MLflow 5000 AWS EC2 ML experiment tracking server
Prefect 4200 Local Workflow orchestration UI
Flask App 8080 Azure Production deployment
Flask App 9696 Local Local development
Grafana 3000 Local Monitoring dashboard
PostgreSQL 5432 Local Grafana datasource
Adminer 8080 Local Database management UI
Streamlit 8501 Local Interactive app

Next Steps

1, deploy prefect, add cron job to update data files. update backfill.parquet
2, add CI/CD, using github action
3, use terraform as IaC
4, add more charts in simulation dashboard.
5, use longer history to train model.
6, make data pipeline more robust
(solution in repo,
all data being taken down after free trial.)
7, send notification and retrain model, if drift detected.

About

MLOps project: using ML model to predict monthly return in the beginning of the month.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors