Monthly-Stock-Return-Prediction

Using ML model to predict monthly return in the beginning of the month.

Project Overview

Description

Problem:
stock return prediction using machine learning model is widely used.
usually, researchers run models in python notebook with conda environment.
it works fine until the data source getting bigger, and the python notebook grows to 10MB.

How MLOps helps:
That is when to bring in MLOps, setup model training pipeline, register models clearly,
with proper prediction drift monitoring, and model deployed in cloud.
That way, researchers works easier without messy and big python notebook.
But they can still work in python notebook.

Data Sources:
YFinance for stock price, company sector, index price and vix
Fred for treasury

Model Features:

Sector: Company Sector

Month Index: the number index of month from the first month in traning dataset. e.g. the first month of my training set is 2023/07/01. the month_index of 2023/08/01 is 1

Index Average: past year daily S&P500 index average

Alpha: the constant parameter where stock price regress on SPX, from past year daily price

Beta: the variable parameter where stock price regress on SPX, from past year daily price

Historical Volatility: standard variation from past year daily price. implied volatility is actually a better choice, but harder to get

End Of Month 10 Year Treasury

Monthly Average of 10 year Treasury

Spread: EndofMonth 10 Year Treasury - EndofMonth 2 Year Treasury

VIX Average: past month daily VIX average

Technology Stack

Python
MLFlow
AWS EC2, RDS, S3
Prefect
Evidently
Streamlit

Modules

Model Training

see model_training folder
training machine learning models and register into mlflow registry.
pipeline includes download data, transform data, prepare data, hyperopt training and registering.
prefect orchestrate all tasks

Model Prediction

see model_prediction folder
predict function, fetch models from mlflow server if alive, else fetch from S3 artifact store
it loads models and artifacts, using input data to return prediction.
the predict function then wrapped with flask app, and then put into docker image

Monitoring

see monitoring folder
it spin up a postgres database, adminer, grafana and a dataloader.
data loader is a python script wrapped in docker to load data into postgres database.
grafana is a dashboard to see data quality and prediction drift.

Testing

see model_prediction/tests folder
in model_prediction/run.sh
1, export environment variables,
2, build the flask app image if not built,
3, spin up localstack to mock s3,
4, build test container
5, configure localstack
6, run test container for unit tests
7, run flask app container
8, run test container for integration test
9, clean up

Deploy

i deployed to the flask app container to azure

curl -X POST http://predict-app.eastus.azurecontainer.io:8080/predict \
 -H "Content-Type: application/json" \
 -d @json_records.json

json_records.json is the feature example, it includes hundreds of records.
feel free to copy only one of them and passed into flask app.
change the last line to --data "$(jq '.[0]' json_records.json)" if you have jq installed.
feel free to adjust the index number.

Simulation

use streamlit to serve P&L chart.

Ports Overview

Service	Port	Environment	Notes
MLflow	5000	AWS EC2	ML experiment tracking server
Prefect	4200	Local	Workflow orchestration UI
Flask App	8080	Azure	Production deployment
Flask App	9696	Local	Local development
Grafana	3000	Local	Monitoring dashboard
PostgreSQL	5432	Local	Grafana datasource
Adminer	8080	Local	Database management UI
Streamlit	8501	Local	Interactive app

Next Steps

1, deploy prefect, add cron job to update data files. update backfill.parquet
2, add CI/CD, using github action
3, use terraform as IaC
4, add more charts in simulation dashboard.
5, use longer history to train model.
6, make data pipeline more robust
(solution in repo,
all data being taken down after free trial.)
7, send notification and retrain model, if drift detected.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
images		images
model_prediction		model_prediction
model_training		model_training
monitoring		monitoring
simulation		simulation
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monthly-Stock-Return-Prediction

Project Overview

Description

Technology Stack

Modules

Model Training

Model Prediction

Monitoring

Testing

Deploy

Simulation

Ports Overview

Next Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Monthly-Stock-Return-Prediction

Project Overview

Description

Technology Stack

Modules

Model Training

Model Prediction

Monitoring

Testing

Deploy

Simulation

Ports Overview

Next Steps

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages