Recommender System with Continuous Retraining on Amazon EKS with NVIDIA Merlin, HugeCTR, NVIDIA Triton Inference server, and Kubeflow Pipelines
This project is a deep learning based recommender system with continuous retraining. The recommendation model predicts Click Through Rates (CTR) and automatically retrains when performance degrades. This set up utilizes technologies including Amazon Elastic Kubernetes Service (EKS), NVIDIA Triton Inference Server, NVIDIA Merlin (NVTabular, HugeCTR), and Kubeflow Pipelines.
-
Amazon Elastic Kubernetes Service (EKS) is a fully managed Kubernetes service that can scale nodes to meet changing workload demands.
-
NVIDIA Triton Inference Server is an open source inference serving software that enables the deployment of AI/ML models from frameworks including HugeCTR, PyTorch, TensorFlow, TensorRT, et cetera.
-
NVIDIA Merlin is an open source framework for building recommender systems at scale.
-
Kubeflow Pipelines (KFP) is an open source platform for writing machine learning workflows natively in Python and deploying them on Kubernetes-based systems.
You can learn more about this project from this blog post: https://mustaphaunubi.medium.com/building-a-recommender-system-with-continuous-retraining-on-amazon-eks-with-nvidia-merlin-hugectr-5b734c71bbc5
There are two variants of this project based on the autoscaling product.
-
Triton pods are scaled using Kubernetes HPA with Custom metrics and the Cluster Nodes are managed/scaled by Karpenter. You can find the code in this directory Merlin-MLOps-on-AWS-with-Karpenter, also visit SETUP_INSTRUCTIONS for instructions on how to set up the infrastructure and deploy the recommender system.

-
Triton pods are scaled using the Kubernetes HPA with Custom metrics and Cluster nodes are scaled by Cluster Autoscaler. You can find the code in this directory Merlin-MLOps-on-AWS-with-with-Cluster_Autoscaler, also visit SETUP_INSTRUCTIONS for instructions on how to set up the infrastructure and deploy the recommender system.

This work was partly inspired by Merlin MLOps with Kubeflow Pipelines on Google Kubernetes Engine. Therefore, you will find that some of the ideas in the referenced project are replicated in this implementation but updated to use Amazon EKS and other services on AWS like SQS, EFS, S3, etc. Also, the autoscaling was replaced with the two approaches referenced above.
