Skip to content

Commit 6b012dc

Browse files
committed
Address review feedback
- Add T4R to README. - Wordsmithing in 01 notebook for building a RecSys. - Flake8_nb fixes in 01 notebook. - Update example README to include summary of the SageMaker notebook. - Update the README for Building a RecSys. - Add the merlin-tensorflow:nightly container and URL for NGC to find the release tags. - Update the instructions for starting the container. - Update the 02 notebook for GS MovieLens to include links to NVT doc and repo. Add a Next Steps heading with links to the Operators and classes used in the notebook. - Review feedback from Benedikt.
1 parent add485a commit 6b012dc

File tree

5 files changed

+103
-239
lines changed

5 files changed

+103
-239
lines changed

README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,17 @@ models to highly-advanced deep learning models. With Merlin Models, you can:
7171
- Assemble connectable building blocks for common RecSys architectures so that
7272
you can create of new models quickly and easily.
7373

74+
**[Transformers4Rec](https://github.com/NVIDIA-Merlin/Transformers4Rec)**
75+
[![PyPI version shields.io](https://img.shields.io/pypi/v/Transformers4Rec.svg)](https://pypi.org/project/Transformers4Rec/)<br>
76+
The Transformers4Rec library provides sequential and session-based recommendation.
77+
The library provides modular building blocks that are compatible with standard PyTorch modules.
78+
You can use the building blocks to design custom architectures such as multiple towers, multiple heads and tasks, and losses.
79+
With Transformers4Rec, you can:
80+
81+
- Build sequential and session-based recommenders from any sequential tabular data.
82+
- Take advantage of the integration with NVTabular for seamless data preprocessing and feature engineering.
83+
- Perform next-item prediction as well as classic binary classification or regression tasks.
84+
7485
**[Merlin Systems](https://github.com/NVIDIA-Merlin/systems)**
7586
[![PyPI version shields.io](https://img.shields.io/pypi/v/merlin-systems.svg)](https://pypi.org/project/merlin-systems/)<br>
7687
Merlin Systems provides tools for combining recommendation models with other

examples/Building-and-deploying-multi-stage-RecSys/01-Building-Recommender-Systems-with-Merlin.ipynb

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@
5050
"id": "405280b0-3d48-43b6-ab95-d29be7a43e9e",
5151
"metadata": {},
5252
"source": [
53-
"The figure below represents a four-stage recommender systems. This is more complex process than only training a single model and deploying it, and it is much more realistic and closer to what's happening in the real-world recommender production systems."
53+
"The figure below represents a four-stage recommender systems. This is a more complex process than only training a single model and deploying it, and it is much more realistic and closer to what's happening in the real-world recommender production systems."
5454
]
5555
},
5656
{
@@ -115,7 +115,7 @@
115115
"source": [
116116
"**Compatibility:**\n",
117117
"\n",
118-
"These notebooks are developed and tested using our latest `merlin-tensorflow:22.XX` container on [NVIDIA's docker registry](https://catalog.ngc.nvidia.com/containers?filters=&orderBy=dateModifiedDESC&query=merlin)."
118+
"This notebook is developed and tested using the latest `merlin-tensorflow` container from the NVIDIA NGC catalog. To find the tag for the most recently-released container, refer to the [Merlin TensorFlow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow) page."
119119
]
120120
},
121121
{
@@ -152,7 +152,8 @@
152152
"source": [
153153
"import os\n",
154154
"import nvtabular as nvt\n",
155-
"from nvtabular.ops import *\n",
155+
"from nvtabular.ops import Rename, Filter, Dropna, LambdaOp, Categorify, \\\n",
156+
" TagAsUserFeatures, TagAsUserID, TagAsItemFeatures, TagAsItemID, AddMetadata\n",
156157
"\n",
157158
"from merlin.schema.tags import Tags\n",
158159
"\n",
@@ -265,7 +266,7 @@
265266
"id": "1e7bfb5c-88ed-4cf9-8a17-98c0284adb36",
266267
"metadata": {},
267268
"source": [
268-
"In the NVTabular workflow below, notice that we apply `Dropna()` op at the end. The reason we do that is to remove rows with missing values in the final dataframe after preceding transformations. Although, the synthetic dataset that we generate above and use in this notebook does not have null entries, you might have null entries in your `user_id` and `item_id` columns in your own custom dataset. Therefore while applying `Dropna()` we will not be registering null `user_id_raw` and `item_id_raw` values in the feature store, and will be avoiding potential issues that can occur because of any null entires."
269+
"In the following NVTabular workflow, notice that we apply the `Dropna()` Operator at the end. We add the Operator to remove rows with missing values in the final DataFrame after the preceding transformations. Although, the synthetic dataset that we generate and use in this notebook does not have null entries, you might have null entries in your `user_id` and `item_id` columns in your own custom dataset. Therefore, while applying `Dropna()` we will not be registering null `user_id_raw` and `item_id_raw` values in the feature store, and will be avoiding potential issues that can occur because of any null entries."
269270
]
270271
},
271272
{
@@ -303,7 +304,7 @@
303304
"\n",
304305
"targets = [\"click\"] >> AddMetadata(tags=[Tags.BINARY_CLASSIFICATION, \"target\"])\n",
305306
"\n",
306-
"outputs = user_id + item_id + item_features + user_features + user_id_raw + item_id_raw + targets\n",
307+
"outputs = user_id + item_id + item_features + user_features + user_id_raw + item_id_raw + targets\n",
307308
"\n",
308309
"# add dropna op to filter rows with nulls\n",
309310
"outputs = outputs >> Dropna()"
Lines changed: 39 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,65 @@
11
# Deploying a Multi-Stage Recommender System
22

3-
We created two Jupyter notebooks that demonstrate two different stages of a Recommender Systems.
4-
The goal of the notebooks is to show how to deploy a multi-stage Recommender System and serve recommendations with Triton Inference Server.
3+
We created two Jupyter notebooks that demonstrate two different stages of recommender systems.
4+
The notebooks show how to deploy a multi-stage recommender system and serve recommendations with Triton Inference Server.
55
The notebooks demonstrate how to use the NVTabular, Merlin Models, and Merlin Systems libraries for feature engineering, training, and then inference.
66

77
The two example notebooks are structured as follows:
88

9-
- [Building the Recommender System](01-Building-Recommender-Systems-with-Merlin.ipynb):
9+
- [Building the Recommender System](01-Building-Recommender-Systems-with-Merlin.ipynb):
1010
- Execute the preprocessing and feature engineering pipeline (ETL) with NVTabular on the GPU/CPU.
1111
- Train a ranking and retrieval model with TensorFlow based on the ETL output.
1212
- Export the saved models, user and item features, and item embeddings.
1313

14-
- [Deploying the Recommender System with Triton](02-Deploying-multi-stage-RecSys-with-Merlin-Systems.ipynb):
14+
- [Deploying the Recommender System with Triton](02-Deploying-multi-stage-RecSys-with-Merlin-Systems.ipynb):
1515
- Set up a Feast feature store for feature storing and a Faiss index for similarity search.
1616
- Build a multi-stage recommender system ensemble pipeline with Merlin Systems operators.
1717
- Perform inference with the Triton Inference Server using the Merlin Systems library.
1818

1919
## Running the Example Notebooks
2020

21-
Merlin docker containers are available on http://ngc.nvidia.com/catalog/containers/ with pre-installed versions. For `Building-and-deploying-multi-stage-RecSys` example notebooks we used `merlin-tensorflow-inference` container that has NVTabular with TensorFlow and Triton Inference support.
21+
Containers with the Merlin libraries are available from the NVIDIA NGC catalog.
22+
To run the sample notebooks, use the `merlin-tensorflow` container.
2223

23-
To run the example notebooks using Docker containers, do the following:
24+
You can pull and run the `nvcr.io/nvidia/merlin/merlin-tensorflow:nightly` container.
2425

25-
1. Once you pull the inference container, launch it by running the following command:
26-
```
27-
docker run -it --gpus all -p 8000:8000 -p 8001:8001 -p 8002:8002 -p 8888:8888 -v <path to your data>:/workspace/data/ --ipc=host <docker container> /bin/bash
28-
```
29-
The container will open a shell when the run command execution is completed. You can remove the `--gpus all` flag to run the example on CPU.
26+
> In production, instead of using the `nightly` tag, specify a release tag.
27+
> You can find the release tags and more information on the [Merlin TensorFlow](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/merlin/containers/merlin-tensorflow) container page.
3028
31-
1. You will have to start JupyterLab on the Docker container. First, install jupyter-lab with the following command if it is missing:
32-
```
33-
pip install jupyterlab
29+
To run the example notebooks using a container, do the following:
30+
31+
1. After you pull the container, launch it by running the following command:
32+
33+
```shell
34+
docker run -it --gpus all -p 8000:8000 -p 8001:8001 -p 8002:8002 -p 8888:8888 \
35+
-v <path to your data>:/workspace/data/ --ipc=host \
36+
nvcr.io/nvidia/merlin/merlin-tensorflow:nightly /bin/bash
3437
```
35-
36-
For more information, see [Installation Guide](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html).
3738

38-
2. Start the jupyter-lab server by running the following command:
39+
You can remove the `--gpus all` flag to run the example on CPU.
40+
41+
The container opens a shell when the run command execution is complete.
42+
Your shell prompt should look similar to the following example:
43+
44+
```text
45+
root@2efa5b50b909:
3946
```
47+
48+
1. Start JupyterLab by running the following command:
49+
50+
```shell
4051
jupyter-lab --allow-root --ip='0.0.0.0' --NotebookApp.token='<password>'
4152
```
4253

43-
3. Open any browser to access the jupyter-lab server using `localhost:8888`.
54+
View the messages in your terminal to identify the URL for JupyterLab.
55+
The messages in your terminal should show lines like the following example:
56+
57+
```text
58+
Or copy and paste one of these URLs:
59+
http://2efa5b50b909:8888/lab?token=9b537d1fda9e4e9cadc673ba2a472e247deee69a6229ff8d
60+
or http://127.0.0.1:8888/lab?token=9b537d1fda9e4e9cadc673ba2a472e247deee69a6229ff8d
61+
```
62+
63+
1. Open a browser and use the `127.0.0.1` URL provided in the messages from JupyterLab.
4464

45-
4. Once in the server, navigate to the ```/Merlin/examples/Building-and-deploying-multi-stage-RecSys/``` directory and execute the example notebooks.
65+
1. After you log in to JupyterLab, navigate to the ```/Merlin/examples/Building-and-deploying-multi-stage-RecSys/``` directory and execute the example notebooks.

examples/README.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# NVIDIA Merlin Example Notebooks
22

3-
We have a collection of Jupyter example notebooks that are based on different datasets to provide end-to-end examples for NVIDIA Merlin.
3+
We have a collection of Jupyter example notebooks that show how to build an end-to-end recommender system with NVIDIA Merlin.
4+
The notebooks use different datasets to demonstrate different feature engineering workflows that might help you to adapt your data for a recommender system.
5+
46
These example notebooks demonstrate how to use NVTabular with TensorFlow, PyTorch, [HugeCTR](https://github.com/NVIDIA-Merlin/HugeCTR) and [Merlin Models](https://github.com/NVIDIA-Merlin/models).
57
Each example provides additional details about the end-to-end workflow, such as includes ETL, training, and inference.
68

@@ -39,6 +41,19 @@ These notebooks demonstrate how to scale NVTabular as well as the following:
3941
- Train recommender system models with HugeCTR using multiple GPUs.
4042
- Inference with the Triton Inference Server and Merlin Models for TensorFlow or HugeCTR.
4143

44+
### [Training and Serving with Merlin on AWS SageMaker](./sagemaker-tensorflow/)
45+
46+
The notebook and scripts demonstrate how to use Merlin components like NVTabular, Merlin Models, and Merlin Systems
47+
with Triton Inference Server to build and deploy a sample end-to-end recommender system in AWS SageMaker.
48+
49+
- Use the Amazon SageMaker Python SDK to interact with the SageMaker environment.
50+
- Create a sample NVTabular workflow to prepare data for binary classification.
51+
- Train a DLRMModel with Merlin Models for click and conversion prediction.
52+
- Create a Merlin Systems ensemble for use with Triton Inference Server.
53+
- Build a container and store it in AWS ECR that is based on Merlin and includes the training script.
54+
- Use the Python SDK to run the container and train the model.
55+
- Use the boto3 library locally to make inference requests to Triton Inference Server in the container that is running in the SageMaker environment.
56+
4257
## Running the Example Notebooks
4358

4459
You can run the examples with Docker containers.

0 commit comments

Comments
 (0)