Using ML model to predict monthly return in the beginning of the month.
Problem:
stock return prediction using machine learning model is widely used.
usually, researchers run models in python notebook with conda environment.
it works fine until the data source getting bigger, and the python notebook grows to 10MB.
How MLOps helps:
That is when to bring in MLOps, setup model training pipeline, register models clearly,
with proper prediction drift monitoring, and model deployed in cloud.
That way, researchers works easier without messy and big python notebook.
But they can still work in python notebook.
Data Sources:
YFinance for stock price, company sector, index price and vix
Fred for treasury
Model Features:
Sector: Company Sector
Month Index: the number index of month from the first month in traning dataset. e.g. the first month of my training set is 2023/07/01. the month_index of 2023/08/01 is 1
Index Average: past year daily S&P500 index average
Alpha: the constant parameter where stock price regress on SPX, from past year daily price
Beta: the variable parameter where stock price regress on SPX, from past year daily price
Historical Volatility: standard variation from past year daily price. implied volatility is actually a better choice, but harder to get
End Of Month 10 Year Treasury
Monthly Average of 10 year Treasury
Spread: EndofMonth 10 Year Treasury - EndofMonth 2 Year Treasury
VIX Average: past month daily VIX average
Python
MLFlow
AWS EC2, RDS, S3
Prefect
Evidently
Streamlit
see model_training folder
training machine learning models and register into mlflow registry.
pipeline includes download data, transform data, prepare data, hyperopt training and registering.
prefect orchestrate all tasks
see model_prediction folder
predict function, fetch models from mlflow server if alive, else fetch from S3 artifact store
it loads models and artifacts, using input data to return prediction.
the predict function then wrapped with flask app, and then put into docker image
see monitoring folder
it spin up a postgres database, adminer, grafana and a dataloader.
data loader is a python script wrapped in docker to load data into postgres database.
grafana is a dashboard to see data quality and prediction drift.
see model_prediction/tests folder
in model_prediction/run.sh
1, export environment variables,
2, build the flask app image if not built,
3, spin up localstack to mock s3,
4, build test container
5, configure localstack
6, run test container for unit tests
7, run flask app container
8, run test container for integration test
9, clean up
i deployed to the flask app container to azure
curl -X POST http://predict-app.eastus.azurecontainer.io:8080/predict \
-H "Content-Type: application/json" \
-d @json_records.json
json_records.json is the feature example, it includes hundreds of records.
feel free to copy only one of them and passed into flask app.
change the last line to
--data "$(jq '.[0]' json_records.json)"
if you have jq installed.
feel free to adjust the index number.
use streamlit to serve P&L chart.
| Service | Port | Environment | Notes |
|---|---|---|---|
| MLflow | 5000 | AWS EC2 | ML experiment tracking server |
| Prefect | 4200 | Local | Workflow orchestration UI |
| Flask App | 8080 | Azure | Production deployment |
| Flask App | 9696 | Local | Local development |
| Grafana | 3000 | Local | Monitoring dashboard |
| PostgreSQL | 5432 | Local | Grafana datasource |
| Adminer | 8080 | Local | Database management UI |
| Streamlit | 8501 | Local | Interactive app |
1, deploy prefect, add cron job to update data files. update backfill.parquet
2, add CI/CD, using github action
3, use terraform as IaC
4, add more charts in simulation dashboard.
5, use longer history to train model.
6, make data pipeline more robust
(solution in repo,
all data being taken down after free trial.)
7, send notification and retrain model, if drift detected.
