This is the main repository of remla24-team10
This README.md
file contains the architecture, installations and comments for each assignments.
- app is the application that communicates with model-service and depends on lib-version
- lib-version is version-aware library that can can be asked for the version of the library
- model-service is a wrapper service for the released ML mod and is dependent on lib-ml
- lib-ml contains the pre-processing logic for data that is used for training or queries
- model-training contains the ML training pipeline and is dependent on lib-ml
Overview of useful files from this repository:
docker-compose.yaml
: Contains the configuration for Docker-composeoperation-manifests.yaml
: Contains the configuration for KubernetesVagrantfile
: Contains the configuration for Vagrant
A readable PDF version of the report can be found here
The project can be ran using either Docker-compose or Kubernetes (Minikube). Vagrant currently creates VM's with some basic ansible playbooks but it currently is not functional yet.
Note: Private network connection is recommended (public networks like TUDelft might not work)
Th VM's can be set up by running:
vagrant up
To access the machines use the following commands:
vagrant ssh controller
vagrant ssh worker1
vagrant ssh worker2
Run docker-compose:
docker compose up
The front-end application should now be available at localhost:5000.
This project requires Docker, Minikube, and Istio to be installed.
Then run the project using Minikube run the following commands:
minikube start
istioctl install
kubectl apply -f [istio install location]/samples/addons/prometheus.yaml
kubectl apply -f operation-manifests.yaml
minikube tunnel
The project should now be available at localhost (no port) through ingress. Please wait a bit before making a request to the server, the server downloads the model on deployment which takes a few minutes.
The project supports dashboards for various metrics utilising prometheus, for this to work the project has to be first ran using minikube.
istioctl dashboard prometheus
The custom metrics which we collect include:
num_requests - Reflects the number of times a page has been served.
average_probability - Reflects the average response value of the model
average_phishing - Reflects the ratio of phishing among all requests
The project supports dashboards for various metrics utilising prometheus, for this to work the project has to be first ran using minikube.
Additionally the prometheus stack should be installed through helm:
helm repo add myprom https://prometheus-community.github.io/helm-charts
helm install myprom prom-repo/kube-prometheus-stack
After reapplying operation-manifests.yaml the prometheus dashboard can be ran using:
istioctl dashboard prometheus
The custom metrics which we collect include:
num_requests - Reflects the number of times a page has been served.
average_probability - Reflects the average response value of the model.
average_phishing - Reflects the ratio of phishing among all requests.
model_accuracy - Reports the accuracy of the model over all labaled requests.
Grafana can also be used for further visualisation of the metrics, to run grafana prometheus should be active. Run:
minikube service myprom-grafana --url
Afterwards login to the dashboard using the default credentials:
Username: admin
Password: prom-operator
The dashboard can now be imported by navigating to dashboards and importing the grafana.json file provided in the repository.
Pull request: janvandermeulen/REMLA-group10#1 and janvandermeulen/REMLA-group10#2 Contributors: Shayan Ramezani and Jan van der Meulen Reviewers: Jan van der Meulen, Shayan Ramezani, Michael Chan, and Remi Lejeune.
We chose poetry to handle all the packages. Instructions to set-up the project are added in the README. The codebase was written such that DVC can do a step-by-ste- reproduction.
Pull request: janvandermeulen/REMLA-group10#4
Contributor: Remi Lejeune
Reviewer: Michael Chan
We uploaded the data to the a remote gdrive cloud bucket using dvc. The data is now versioned and can be accessed by all team members. Furthermore, we created a reproduction pipeline. We have encountered some issues with DVC pull and it may not pull from cloud, in which case run dvc repro to reproduce the files.
Pull request: janvandermeulen/REMLA-group10#4
See description of previous task.
Pull request: janvandermeulen/REMLA-group10#3 Contributor: Michael Chan Reviewer: Jan van der Meulen, Remi Lejeune and Shayan Ramezani
We used pylint and bandit to audit the code quality. The README provides instructions on how to run these tools. We fixed all the errors that the tools showed. Explanation for some of the configuration settings for both pylint and bandit:
A regex was created to accept names with a single capital letter between "_" as those are common names for matrix variables in data science, example of accepted names by regex: X_train, raw_X_train and X. TODO warnings have been suppressed temporarily. As this is still the first version there are still many things that could be improved that have been tagged as TODO for now, this should not affect the code quality. The number of arguments and local variables allowed has been increased as it is common in data science to separate data such as train and test in separate local variables, this results in relatively more variables used. Bandit warning B106 about potential hardcoded access tokens has been suppressed as it falsely triggers on the usage of the word token which is prevalent in data science projects and has nothing to do with password/auth tokens.
Pull requests: remla24-team10/app#1 and remla24-team10/app#2 Contributor: Jan van der Meulen Reviewer: Remi Lejeune We used Flask to create a simple web-app which imports the version-library and prompts the model-service for the result of a prediction.
Pull requests: remla24-team10/lib-version#1 Contributor: Jan van der Meulen Reviewer: Shayan Ramezani This is a automatically versioned library that can be asked for its own library. It updates the version number by automatically pulling the value from its own git tag.
Pull requests: remla24-team10/model-service#1 Contributor: Michael Chan Reviewer: Remi Lejeune We used Flask to serve the model, the model itself is stored on gdrive and its downloaded at runtime.
Pull requests: remla24-team10/lib-ml#7 and remla24-team10/lib-ml#9 Contributor: Shayan Ramezani, Michael Chan Reviewer: Michael Chan, Shayan Ramezani This library provides several functions related to the processing of data. It is published on Pypi and has a workflow setup with github actions.
Pull requests: remla24-team10/model-training#2 Contributor: Shayan Ramezani Reviewer: Jan van der Meulen Model training trains the model and stores all related files to drive via DVC. It was refactored in A2.
Pull request: #1 Contributor: Remi Lejeune Reviewer: Jan van der Meulen
A docker compose file was created, which allow the app to be run easily. It creates two docker containers that communicate between eachother, a few other features were implemented namely: volume mapping, a port mapping, and an environment variable.
Pull request: #3 Contributor: Jan van der Meulen Reviewer: Shayan Ramezani
- We used Vagrant to create multiple virtual machines that run the app. After running
vagrant up
these can be accessed with the commandvagrant ssh controller1
,vagrant ssh worker1
andvagrant ssh worker2
respectively. - A non-trivial ansible script was created to install all the necessary software on the virtual machines. This can be found in
ansible/playbook-controller.yml
andansible/playbook-worker.yml
. - Each VM has a private network and can communicate directly with all other VMs. This can be tested by first ssh-ing into any of the machines. Then running pining another machine. E.g.
vagrant ssh controller1
ping 192.168.57.11
to ping worker1.
- The Vagrantfile uses a loop and template arithmetics to create the VMs. As seen in the definition of the workers which can easily be scaled up to spawn as many workers as necessary. This is done by using the Python file: generate_inventory.py which generates the inventory for Ansible based on the amount of workers defined in the Vagrantfile. And, some simple looping in the Vagrantfile to create the workers.
Pull request: #4 Contributor: Shayan Ramezani Reviewer: Jan van der Meulen
- Currently still work in progress, there are still some problems creating the ansible playbooks to setup minikube.
Pull request: #5 Contributor: Remi Lejeune Reviewer: Michael Chan
- Now the app can be run using minikube and kubernetes. For both the front and backend
operation-manifests.yaml
contains a deployment, a service and ingress. Minikube utilises an ingress for the app to which has to be tunneled.
Pull request: remla24-team10/app#3 & #6 Contributor: Michael Chan Reviewer: Remi Lejeune
- Prometheus ServiceMonitor was used to collect metrics, which includes 2 gauges and a counter. A grafana json file was included in the repository which can be imported in grafana, it contains a dashboard with 3 panels.
Pull request: remla24-team10/model-training#4 & remla24-team10/model-training#6
Contributor: Jan van der Meulen & Michael Chan
The Pytest library is setup and can be found in the model-training repository under the tests
folder.
Some tests such as the non-determinism tests have to be ran manually after dvc pull and are not integrated into CI.
Pull request: remla24-team10/model-training#5 & remla24-team10/model-training#3 Contributer: Remi Lejeune & Shayan Ramezani Pipeline currently automatically runs fast tests only. We can now see the tests results on codecov We can also see them in the actions and on the README
Pull request: #9 & #4 Contributor: Michael Chan & Shayan Ramezani The project now utilises the istio service mesh for requests and compatible with ansible. Dashboards are not yet visible on host when using vagrant VM + ansible.
Pull request: #13 & remla24-team10/app#4 Contributor: Remi Lejeune & Jan van der Meulen Two app versions have been created and are served on a 50/50 basis via istio. This can be set to 90/10 later but 50/50 makes it easier to test, additionally the prometheus update interval is set to 1s for the same reason.
Pull request: #12 Contributor: Michael Chan Rate limits were implemented. The rate limit is set to 20 requests per minute for each page version.