Skip to content

MustaphaU/Merlin-RecSys-MLOps-on-AWS

Repository files navigation

Recommender System with Continuous Retraining on Amazon EKS with NVIDIA Merlin, HugeCTR, NVIDIA Triton Inference server, and Kubeflow Pipelines

Architecture Diagram

This project is a deep learning based recommender system with continuous retraining. The recommendation model predicts Click Through Rates (CTR) and automatically retrains when performance degrades. This set up utilizes technologies including Amazon Elastic Kubernetes Service (EKS), NVIDIA Triton Inference Server, NVIDIA Merlin (NVTabular, HugeCTR), and Kubeflow Pipelines.

  • Amazon Elastic Kubernetes Service (EKS) is a fully managed Kubernetes service that can scale nodes to meet changing workload demands.

  • NVIDIA Triton Inference Server is an open source inference serving software that enables the deployment of AI/ML models from frameworks including HugeCTR, PyTorch, TensorFlow, TensorRT, et cetera.

  • NVIDIA Merlin is an open source framework for building recommender systems at scale.

  • Kubeflow Pipelines (KFP) is an open source platform for writing machine learning workflows natively in Python and deploying them on Kubernetes-based systems.

About the Project

You can learn more about this project from this blog post: https://mustaphaunubi.medium.com/building-a-recommender-system-with-continuous-retraining-on-amazon-eks-with-nvidia-merlin-hugectr-5b734c71bbc5

Deployment Instructions

There are two variants of this project based on the autoscaling product.

  1. Autoscaling with Karpenter (self-managed) and Horizontal Pod Autoscaler

    Triton pods are scaled using Kubernetes HPA with Custom metrics and the Cluster Nodes are managed/scaled by Karpenter. You can find the code in this directory Merlin-MLOps-on-AWS-with-Karpenter, also visit SETUP_INSTRUCTIONS for instructions on how to set up the infrastructure and deploy the recommender system. Autoscaling with Karpenter and k8s HPA

  2. Autoscaling with Cluster Autoscaler and Horizontal Pod Autoscaler (HPA)

    Triton pods are scaled using the Kubernetes HPA with Custom metrics and Cluster nodes are scaled by Cluster Autoscaler. You can find the code in this directory Merlin-MLOps-on-AWS-with-with-Cluster_Autoscaler, also visit SETUP_INSTRUCTIONS for instructions on how to set up the infrastructure and deploy the recommender system. Autoscaling with Cluster Autoscaler and k8s HPA

Acknowledgements

This work was partly inspired by Merlin MLOps with Kubeflow Pipelines on Google Kubernetes Engine. Therefore, you will find that some of the ideas in the referenced project are replicated in this implementation but updated to use Amazon EKS and other services on AWS like SQS, EFS, S3, etc. Also, the autoscaling was replaced with the two approaches referenced above.

Resources

  1. Continuously Improving Recommender Systems for Competitive Advantage Using NVIDIA Merlin and MLOps by Shashank Verma, Abhishek Sawarkar, Vinh Nguyen, and Davide Onofrio
  2. Merlin - MLOps on GKE