Recommender System with Continuous Retraining on Amazon EKS with NVIDIA Merlin, HugeCTR, NVIDIA Triton Inference server, and Kubeflow Pipelines

This project is a deep learning based recommender system with continuous retraining. The recommendation model predicts Click Through Rates (CTR) and automatically retrains when performance degrades. This set up utilizes technologies including Amazon Elastic Kubernetes Service (EKS), NVIDIA Triton Inference Server, NVIDIA Merlin (NVTabular, HugeCTR), and Kubeflow Pipelines.

Amazon Elastic Kubernetes Service (EKS) is a fully managed Kubernetes service that can scale nodes to meet changing workload demands.
NVIDIA Triton Inference Server is an open source inference serving software that enables the deployment of AI/ML models from frameworks including HugeCTR, PyTorch, TensorFlow, TensorRT, et cetera.
NVIDIA Merlin is an open source framework for building recommender systems at scale.
Kubeflow Pipelines (KFP) is an open source platform for writing machine learning workflows natively in Python and deploying them on Kubernetes-based systems.

About the Project

You can learn more about this project from this blog post: https://mustaphaunubi.medium.com/building-a-recommender-system-with-continuous-retraining-on-amazon-eks-with-nvidia-merlin-hugectr-5b734c71bbc5

Deployment Instructions

There are two variants of this project based on the autoscaling product.

Autoscaling with Karpenter (self-managed) and Horizontal Pod Autoscaler

Triton pods are scaled using Kubernetes HPA with Custom metrics and the Cluster Nodes are managed/scaled by Karpenter. You can find the code in this directory Merlin-MLOps-on-AWS-with-Karpenter, also visit SETUP_INSTRUCTIONS for instructions on how to set up the infrastructure and deploy the recommender system.
Autoscaling with Cluster Autoscaler and Horizontal Pod Autoscaler (HPA)

Triton pods are scaled using the Kubernetes HPA with Custom metrics and Cluster nodes are scaled by Cluster Autoscaler. You can find the code in this directory Merlin-MLOps-on-AWS-with-with-Cluster_Autoscaler, also visit SETUP_INSTRUCTIONS for instructions on how to set up the infrastructure and deploy the recommender system.

Acknowledgements

This work was partly inspired by Merlin MLOps with Kubeflow Pipelines on Google Kubernetes Engine. Therefore, you will find that some of the ideas in the referenced project are replicated in this implementation but updated to use Amazon EKS and other services on AWS like SQS, EFS, S3, etc. Also, the autoscaling was replaced with the two approaches referenced above.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
Merlin-MLOps-on-AWS-with-Karpenter		Merlin-MLOps-on-AWS-with-Karpenter
Merlin-MLOps-on-AWS-with-with-Cluster_Autoscaler		Merlin-MLOps-on-AWS-with-with-Cluster_Autoscaler
images		images
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recommender System with Continuous Retraining on Amazon EKS with NVIDIA Merlin, HugeCTR, NVIDIA Triton Inference server, and Kubeflow Pipelines

About the Project

Deployment Instructions

Autoscaling with Karpenter (self-managed) and Horizontal Pod Autoscaler

Autoscaling with Cluster Autoscaler and Horizontal Pod Autoscaler (HPA)

Acknowledgements

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Recommender System with Continuous Retraining on Amazon EKS with NVIDIA Merlin, HugeCTR, NVIDIA Triton Inference server, and Kubeflow Pipelines

About the Project

Deployment Instructions

Autoscaling with Karpenter (self-managed) and Horizontal Pod Autoscaler

Autoscaling with Cluster Autoscaler and Horizontal Pod Autoscaler (HPA)

Acknowledgements

Resources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages