diff --git a/.DS_Store b/.DS_Store deleted file mode 100644 index f27dd7b..0000000 Binary files a/.DS_Store and /dev/null differ diff --git a/.gitignore b/.gitignore index a35f1db..2cd2914 100644 --- a/.gitignore +++ b/.gitignore @@ -3,3 +3,4 @@ data/ container/ *.npy .ipynb_checkpoints +.DS_Store diff --git a/bring-custom-container.ipynb b/bring-custom-container.ipynb index 3b196c9..74ad617 100644 --- a/bring-custom-container.ipynb +++ b/bring-custom-container.ipynb @@ -12,13 +12,40 @@ "metadata": {}, "source": [ "## Overview\n", - "Here, we’ll show how to package a simple Python example which showcases the decision tree algorithm from the widely used scikit-learn machine learning package. The example is purposefully fairly trivial since the point is to show the surrounding structure that you’ll want to add to your own code so you can train and host it in Amazon SageMaker.\n", "\n", - "The ideas shown here will work in any language or environment. You’ll need to choose the right tools for your environment to serve HTTP requests for inference, but good HTTP environments are available in every language these days.\n", + "### Background\n", + "Here, we'll show how to bring your docker cotainer that packages your environment and code. We showcase the [decision tree](http://scikit-learn.org/stable/modules/tree.html) algorithm from the widely used [scikit-learn](http://scikit-learn.org/stable/) machine learning package. The example is purposefully fairly trivial since the point is to show the surrounding structure that you'll want to add to your own container so you can bring it to Amazon SageMaker for training and hosting.\n", "\n", - "In this example, we use a single image to support training and hosting. This is easy because it means that we only need to manage one image and we can set it up to do everything. Sometimes you’ll want separate images for training and hosting because they have different requirements. Just separate the parts discussed below into separate Dockerfiles and build two images. Choosing whether to have a single image or two images is really a matter of which is more convenient for you to develop and manage.\n", "\n", - "If you’re only using Amazon SageMaker for training or hosting, but not both, there is no need to build the unused functionality into your container.\n", + "### High-level overview\n", + "\n", + "The following diagram shows how you typically train and deploy a model with Amazon SageMaker:" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The area labeled SageMaker highlights the two components of SageMaker: model training and model deployment. The area labeled [EC2 container registry](https://aws.amazon.com/ecr/) is where we store, manage, and deploy our Docker container images. The training data and model artifacts are stored in S3 bucket. \n", + "\n", + "In this lab, we use a single image to support both model training and hosting for simplicity. Sometimes you’ll want separate images for training and hosting because they have different requirements. \n", + "\n", + "The high-level steps include:\n", + "1. **Building the container** - We walk through the different components of the containers and inspect the docker file. Then we build and push the container to ECR. \n", + "2. **Setup & Upload Data** - Once our container is built and registered. We ready sagemaker and upload the data to S3. \n", + "3. **Model Training** - Create a training job using SageMaker Python SDK. It will pull data from S3 and use the container we built. \n", + "4. **Model Deployment** - Once training is complete, deploy our model to a HTTP endpoint using SageMaker Python SDK. \n", + "5. **Run Inferences** - Run predictions to test our model.\n", + "6. **Cleanup**\n", "\n" ] }, @@ -27,17 +54,20 @@ "metadata": {}, "source": [ "## Building the container\n", - "Docker provides a simple way to package arbitrary code into an image that is totally self-contained. Once you have an image, you can use Docker to run a container based on that image. Running a container is just like running a program on the machine except that the container creates a fully self-contained environment for the program to run. Containers are isolated from each other and from the host environment, so the way you set up your program is the way it runs, no matter where you run it.\n", + "[Docker](https://aws.amazon.com/docker/#:~:text=Docker%20is%20a%20software%20platform,test%2C%20and%20deploy%20applications%20quickly.&text=Running%20Docker%20on%20AWS%20provides,distributed%20applications%20at%20any%20scale.) packages software into standardized units called [containers](https://aws.amazon.com/containers/) that have everything the software needs to run including libraries, system tools, code, and runtime. Using Docker, you can quickly deploy and scale applications into any environment and know your code will run.\n", + "\n", "\n", - "Amazon SageMaker uses Docker to allow users to train and deploy arbitrary algorithms." + "Amazon SageMaker uses Docker to allow users to train and deploy arbitrary algorithms. More details on [how to use docker containers with sagemaker](https://docs.aws.amazon.com/sagemaker/latest/dg/docker-containers.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### Parts of the container\n", - "In the container directory are all the components you need to package the sample algorithm for Amazon SageMager:" + "### Walkthrough of the container directory\n", + "You can find the source code of the sample container we are using in [this GitHub repository](https://github.com/aws/amazon-sagemaker-examples/tree/main/advanced_functionality/scikit_bring_your_own). \n", + "\n", + "The container directory contains all the components you need to package for SageMaker:" ] }, { @@ -48,6 +78,7 @@ ".\n", "|-- Dockerfile\n", "|-- build_and_push.sh\n", + "|-- local_test\n", "`-- decision_trees\n", " |-- nginx.conf\n", " |-- predictor.py\n", @@ -63,32 +94,34 @@ "source": [ "Let’s discuss each of these in turn:\n", "\n", - "- Dockerfile describes how to build your Docker container image. More details below:\n", - "- build_and_push.sh is a script that uses the Dockerfile to build your container images and then pushes it to ECR. We’ll invoke the commands directly later in this notebook, but you can just copy and run the script for your own algorithms.\n", - "- decision_trees is the directory which contains the files that will be installed in the container.\n", - "- local_test is a directory that shows how to test your new container on any computer that can run Docker, including an Amazon SageMaker notebook instance. Using this method, you can quickly iterate using small datasets to eliminate any structural bugs before you use the container with Amazon SageMaker. We’ll walk through local testing later in this notebook.\n", + "- `Dockerfile` describes how to build your Docker container image. More details below.\n", + "- `build_and_push.sh` is a script that uses the Dockerfile to build your container images and then pushes it to ECR. We’ll invoke the commands directly later in this notebook, but you can just copy and run the script for your own algorithms.\n", + "- `local_test` is a directory that shows how to test your new container on any computer that can run Docker, including an Amazon SageMaker notebook instance. Using this method, you can quickly iterate using small datasets to eliminate any structural bugs before you use the container with Amazon SageMaker. Testing is not the focus of this lab, but feel free to checkout the example at your own time. \n", + "- `decision_trees` is the directory which contains the files that will be installed in the container.\n", "\n", - "In this simple application, we only install five files in the container.\n", + "In this simple application, we only install five files in the container. These five show the standard structure of our Python containers, although you are free to choose a different toolset or programming language and therefore could have a different layout.\n", "\n", "The files that we’ll put in the container are:\n", "\n", - "- nginx.conf is the configuration file for the nginx front-end. Generally, you should be able to take this file as-is.\n", - "- predictor.py is the program that actually implements the Flask web server and the decision tree predictions for this app. You’ll want to customize the actual prediction parts to your application. Since this algorithm is simple, we do all the processing here in this file, but you may choose to have separate files for implementing your custom logic.\n", - "- serve is the program started when the container is started for hosting. It simply launches the gunicorn server which runs multiple instances of the Flask app defined in predictor.py. You should be able to take this file as-is.\n", - "- train is the program that is invoked when the container is run for training. You will modify this program to implement your training algorithm.\n", - "- wsgi.py is a small wrapper used to invoke the Flask app. You should be able to take this file as-is.\n", + "- `nginx.conf` is the configuration file for the nginx front-end. Generally, you should be able to take this file as-is.\n", + "- `predictor.py` is the program that actually implements the Flask web server and the decision tree predictions for this app. You’ll want to customize the actual prediction parts to your application. Since this algorithm is simple, we do all the processing here in this file, but you may choose to have separate files for implementing your custom logic.\n", + "- `serve` is the program started when the container is started for hosting. It simply launches the gunicorn server which runs multiple instances of the Flask app defined in predictor.py. You should be able to take this file as-is.\n", + "- `train` is the program that is invoked when the container is run for training. You will modify this program to implement your training algorithm.\n", + "- `wsgi.py` is a small wrapper used to invoke the Flask app. You should be able to take this file as-is.\n", "\n", - "In summary, the two files you will probably want to change for your application are train and predictor.py" + "In summary, the two files you will probably want to change for your application are `train` and `predictor.py`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### The Dockerfile\n", - "The Dockerfile describes the image that we want to build. You can think of it as describing the complete operating system installation of the system that you want to run. A Docker container running is quite a bit lighter than a full operating system, however, because it takes advantage of Linux on the host machine for the basic operations.\n", + "### Install packages\n", + "Please choose `Python 3 (Data Science)` kernel to proceed.\n", "\n", - "For the Python science stack, we will start from a standard Ubuntu installation and run the normal tools to install the things needed by scikit-learn. Finally, we add the code that implements our specific algorithm to the container and set up the right environment to run under." + "We will first install the prerequisite packages:\n", + "- [**aiobotocore**](https://aiobotocore.readthedocs.io/en/latest/): adds async support for AWS services with [botocore](https://github.com/boto/botocore).\n", + "- [**sagemaker-studio-image-build**](https://pypi.org/project/sagemaker-studio-image-build/): CLI for building Docker images in SageMaker Studio using [AWS CodeBuild](https://aws.amazon.com/codebuild/)" ] }, { @@ -98,7 +131,17 @@ "outputs": [], "source": [ "# cell 00\n", - "!pip install --upgrade aiobotocore" + "!pip install -q --upgrade aiobotocore\n", + "!pip install -q sagemaker-studio-image-build" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We will then unzip and copy over the files we need:\n", + "- `scikit_bring_your_own/container` → `lab03_container`\n", + "- `scikit_bring_your_own/data` → `lab03_data` " ] }, { @@ -108,19 +151,27 @@ "outputs": [], "source": [ "# cell 01\n", - "\n", - "!unzip scikit_bring_your_own.zip\n", + "!unzip -q scikit_bring_your_own.zip\n", "!mv scikit_bring_your_own/data/ ./lab03_data/\n", "!mv scikit_bring_your_own/container/ ./lab03_container/\n", - "!rm -rf scikit_bring_your_own\n", - "!cat lab03_container/Dockerfile" + "!rm -rf scikit_bring_your_own" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Building and registering the container" + "### The Dockerfile\n", + "The `Dockerfile` describes the image that we want to build. You can think of it as describing the complete operating system installation of the system that you want to run. A Docker container running is quite a bit lighter than a full operating system, however, because it takes advantage of Linux on the host machine for the basic operations.\n", + "\n", + "For the Python science stack, we will start from a standard Ubuntu installation and run the normal tools to install the things needed by `scikit-learn`. Finally, we add the code that implements our specific algorithm to the container and set up the right environment to run under." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's take a look of what's inside our `Dockerfile`:" ] }, { @@ -129,15 +180,21 @@ "metadata": {}, "outputs": [], "source": [ - "# cell 02\n", - "!pip install sagemaker-studio-image-build" + "!pygmentize lab03_container/Dockerfile" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Building and registering the container" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "> *In the next cell, if you run into IAM permission issue related to CodeBuild, make sure that you follow the steps outlined in the immersion day lab instructions*" + "> **NOTE** *In the next cell, if you run into IAM permission issue related to CodeBuild, make sure that you follow the Prerequisites steps outlined in the [immersion day lab instructions](https://catalog.us-east-1.prod.workshops.aws/v2/workshops/63069e26-921c-4ce1-9cc7-dd882ff62575/en-US/lab3/option2#prerequisites)*" ] }, { @@ -147,8 +204,7 @@ "outputs": [], "source": [ "%%sh\n", - "# cell 03\n", - "\n", + "# cell 02\n", "\n", "cd lab03_container\n", "\n", @@ -162,8 +218,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Using the container\n", - "Here we specify a bucket to use and the role that will be used for working with SageMaker." + "## Setup & Upload Data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Setup the Environment \n", + "Here we specify a bucket to use and the role that will be used for working with SageMaker.\n", + "\n" ] }, { @@ -172,10 +236,9 @@ "metadata": {}, "outputs": [], "source": [ - "# cell 04\n", + "# cell 03\n", "\n", - "# S3 prefix\n", - "prefix = 'DEMO-scikit-byo-iris'\n", + "S3_prefix = 'DEMO-scikit-byo-iris'\n", "\n", "# Define IAM role\n", "import boto3\n", @@ -202,7 +265,7 @@ "metadata": {}, "outputs": [], "source": [ - "# cell 05\n", + "# cell 04\n", "\n", "import sagemaker as sage\n", "from time import gmtime, strftime\n", @@ -214,9 +277,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "When training large models with huge amounts of data, you’ll typically use big data tools, like Amazon Athena, AWS Glue, or Amazon EMR, to create your data in S3. For the purposes of this example, we’re using some the classic Iris dataset, which we have included.\n", + "### Upload data to S3 Bucket" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When training large models with huge amounts of data, you’ll typically use big data tools, like Amazon Athena, AWS Glue, or Amazon EMR, to create your data in S3. For the purposes of this example, we’re using some the [classic Iris dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set) in the `lab03_data` directory. \n", "\n", - "We can use use the tools provided by the SageMaker Python SDK to upload the data to a default bucket." + "We can use use the tools provided by the [SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/) to upload the data to a default bucket." ] }, { @@ -225,27 +295,42 @@ "metadata": {}, "outputs": [], "source": [ - "# cell 06\n", + "# cell 05\n", "\n", "WORK_DIRECTORY = 'lab03_data'\n", "\n", - "data_location = sess.upload_data(WORK_DIRECTORY, key_prefix=prefix)" + "data_location = sess.upload_data(WORK_DIRECTORY, \n", + " key_prefix=S3_prefix)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "In order to use SageMaker to fit our algorithm, we’ll create an Estimator that defines how to use the container to train. This includes the configuration we need to invoke SageMaker training:\n", - "\n", - "- The container name. This is constructed as in the shell commands above.\n", - "- The role. As defined above.\n", - "- The instance count which is the number of machines to use for training.\n", - "- The instance type which is the type of machine to use for training.\n", - "- The output path determines where the model artifact will be written.\n", - "- The session is the SageMaker session object that we defined above.\n", + "## Model Training" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In order to use SageMaker to fit our algorithm, we create an [`estimator`](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html) that defines how to use the container to train. This includes the configuration we need to invoke SageMaker training:\n", "\n", - "Then we use `fit()` on the estimator to train against the data that we uploaded above." + "- `image_uri (str)` - The [Amazon Elastic Container Registry](https://aws.amazon.com/ecr/) path where the docker image is registered. This is constructed in the shell commands in *cell 06*.\n", + "- `role (str)` - SageMaker IAM role as obtained above in *cell 03*.\n", + "- `instance_count (int)` - number of machines to use for training.\n", + "- `instance_type (str)` - the type of machine to use for training.\n", + "- `output_path (str)` - where the model artifact will be written.\n", + "- `sagemaker_session (sagemaker.session.Session)` - the SageMaker session object that we defined in *cell 04*.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then we use `estimator.fit()` method to train against the data that we uploaded.\n", + "The API calls the Amazon SageMaker `CreateTrainingJob` API to start model training. The API uses configuration you provided to create the `estimator` and the specified input training data to send the `CreatingTrainingJob` request to Amazon SageMaker." ] }, { @@ -254,16 +339,19 @@ "metadata": {}, "outputs": [], "source": [ - "# cell 07\n", + "# cell 06\n", "\n", "account = sess.boto_session.client('sts').get_caller_identity()['Account']\n", "region = sess.boto_session.region_name\n", - "image = '{}.dkr.ecr.{}.amazonaws.com/sagemaker-decision-trees:latest'.format(account, region)\n", + "image_uri = '{}.dkr.ecr.{}.amazonaws.com/sagemaker-decision-trees:latest'.format(account, region)\n", + "\n", + "tree = sage.estimator.Estimator(image_uri,\n", + " role, \n", + " instance_count=1, \n", + " instance_type='ml.c4.2xlarge',\n", + " output_path=\"s3://{}/output\".format(sess.default_bucket()),\n", + " sagemaker_session=sess)\n", "\n", - "tree = sage.estimator.Estimator(image,\n", - " role, instance_count=1, instance_type='ml.c4.2xlarge',\n", - " output_path=\"s3://{}/output\".format(sess.default_bucket()),\n", - " sagemaker_session=sess)\n", "file_location = data_location + '/iris.csv'\n", "tree.fit(file_location)" ] @@ -272,10 +360,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Hosting your model\n", + "## Model Deployment\n", "You can use a trained model to get real time predictions using HTTP endpoint. Follow these steps to walk you through the process.\n", "\n", - "Deploying the model to SageMaker hosting just requires a deploy call on the fitted model. This call takes an instance count, instance type, and optionally serializer and deserializer functions. These are used when the resulting predictor is created on the endpoint." + "After the model training successfully completes, you can call the [`estimator.deploy()` method](https://sagemaker.readthedocs.io/en/stable/estimators.html#sagemaker.estimator.Estimator.deploy). The `deploy()` method creates a deployable model, configures the SageMaker hosting services endpoint, and launches the endpoint to host the model. \n", + "\n", + "The method uses the following configurations:\n", + "- `initial_instance_count (int)` – The number of instances to deploy the model.\n", + "- `instance_type (str)` – The type of instances that you want to operate your deployed model.\n", + "- `serializer (int)` – Serialize input data of various formats (a NumPy array, list, file, or buffer) to a CSV-formatted string in this example. \n" ] }, { @@ -284,23 +377,27 @@ "metadata": {}, "outputs": [], "source": [ - "# cell 08\n", + "# cell 07\n", + "\n", "from sagemaker.serializers import CSVSerializer\n", - "predictor = tree.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge', serializer=CSVSerializer())" + "predictor = tree.deploy(initial_instance_count=1, \n", + " instance_type='ml.m4.xlarge', \n", + " serializer=CSVSerializer())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Preparing test data to run inferences" + "## Run Inferences\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "In order to do some predictions, we’ll extract some of the data we used for training and do predictions against it. This is, of course, bad statistical practice, but a good way to see how the mechanism works." + "### Preparing test data\n", + "In order to do some predictions, we’ll extract some of the data we used for training and do predictions against it. This is, of course, bad statistical practice, but an easy way to see how the mechanism works." ] }, { @@ -309,7 +406,7 @@ "metadata": {}, "outputs": [], "source": [ - "# cell 09\n", + "# cell 08\n", "\n", "shape=pd.read_csv(file_location, header=None)\n", "shape.sample(3)" @@ -321,7 +418,7 @@ "metadata": {}, "outputs": [], "source": [ - "# cell 10\n", + "# cell 09\n", "\n", "# drop the label column in the training set\n", "shape.drop(shape.columns[[0]],axis=1,inplace=True)\n", @@ -334,7 +431,7 @@ "metadata": {}, "outputs": [], "source": [ - "# cell 11\n", + "# cell 10\n", "\n", "import itertools\n", "\n", @@ -349,9 +446,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Run predictions\n", + "### Predictions\n", "\n", - "Prediction is as easy as calling predict with the predictor we got back from deploy and the data we want to do predictions with. The serializers take care of doing the data conversions for us." + "Prediction is as easy as calling `predict` with the `predictor` we got back from `deploy` and the data we want to do predictions with. The serializers take care of doing the data conversions for us." ] }, { @@ -360,7 +457,7 @@ "metadata": {}, "outputs": [], "source": [ - "# cell 12\n", + "# cell 11\n", "\n", "print(predictor.predict(test_data.values).decode('utf-8'))" ] @@ -370,7 +467,7 @@ "metadata": {}, "source": [ "## Cleanup\n", - "After completing the lab, use these steps to [delete the endpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-cleanup.html) or run the following code\n" + "After completing the lab, use these steps to [delete the endpoint through AWS Console](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-cleanup.html) or simply run the following code\n" ] }, { @@ -379,27 +476,41 @@ "metadata": {}, "outputs": [], "source": [ - "# cell 13\n", + "# cell 12\n", "sess.delete_endpoint(predictor.endpoint_name)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Remove the container artifacts and data we downloaded." + ] + }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ - "# cell 14\n", + "# cell 13\n", "!rm -rf lab03_container lab03_data" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { "instance_type": "ml.t3.medium", "kernelspec": { - "display_name": "Python 3 (Data Science)", + "display_name": "Python 3", "language": "python", - "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/datascience-1.0" + "name": "python3" }, "language_info": { "codemirror_mode": { @@ -411,7 +522,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.10" + "version": "3.8.3" } }, "nbformat": 4, diff --git a/bring-custom-script.ipynb b/bring-custom-script.ipynb index e37b04f..6c50ce8 100644 --- a/bring-custom-script.ipynb +++ b/bring-custom-script.ipynb @@ -14,7 +14,9 @@ "## TensorFlow script mode training and serving\n", "Script mode is a training script format for TensorFlow that lets you execute any TensorFlow training script in SageMaker with minimal modification. The [SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) handles transferring your script to a SageMaker training instance. On the training instance, SageMaker's native TensorFlow support sets up training-related environment variables and executes your training script. In this tutorial, we use the SageMaker Python SDK to launch a training job and deploy the trained model.\n", "\n", - "Script mode supports training with a Python script, a Python module, or a shell script. In this example, we use a Python script to train a classification model on the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). In this example, we will show how easily you can train a SageMaker using TensorFlow 1.x and TensorFlow 2.0 scripts with SageMaker Python SDK. In addition, this notebook demonstrates how to perform real time inference with the [SageMaker TensorFlow Serving container](https://github.com/aws/sagemaker-tensorflow-serving-container). The TensorFlow Serving container is the default inference method for script mode. For full documentation on the TensorFlow Serving container, please visit [here](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst)." + "Script mode supports training with a Python script, a Python module, or a shell script. In this example, we use a Python script to train a classification model on the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). In this example, we will show how easily you can train a SageMaker using TensorFlow 1.x and TensorFlow 2.0 scripts with SageMaker Python SDK. In addition, this notebook demonstrates how to perform real time inference with the [SageMaker TensorFlow Serving container](https://github.com/aws/sagemaker-tensorflow-serving-container). The TensorFlow Serving container is the default inference method for script mode. For full documentation on the TensorFlow Serving container, please visit [here](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst).\n", + "\n", + "Since TensorFlow package is not called in this notebook, please select `Python 3 (Data Science)` kernel to proceed." ] }, { @@ -100,7 +102,11 @@ "\n", "`py_version` is set to `'py3'` to indicate that we are using script mode since legacy mode supports only Python 2. Though Python 2 will be deprecated soon, you can use script mode with Python 2 by setting py_version to `py2` and `script_mode` to True.\n", "\n", - "`distribution` is used to configure the distributed training setup. It's required only if you are doing distributed training either across a cluster of instances or across multiple GPUs. Here we are using parameter servers as the distributed training schema. SageMaker training jobs run on homogeneous clusters. To make parameter server more performant in the SageMaker setup, we run a parameter server on every instance in the cluster, so there is no need to specify the number of parameter servers to launch. Script mode also supports distributed training with [Horovod](https://github.com/horovod/horovod). You can find the full documentation on how to configure distributions [here](https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/tensorflow#distributed-training)." + "`distribution` is used to configure the distributed training setup. It's required only if you are doing distributed training either across a cluster of instances or across multiple GPUs. Here we are using parameter servers as the distributed training schema. SageMaker training jobs run on homogeneous clusters. To make parameter server more performant in the SageMaker setup, we run a parameter server on every instance in the cluster, so there is no need to specify the number of parameter servers to launch. Script mode also supports distributed training with [Horovod](https://github.com/horovod/horovod). You can find the full documentation on how to configure distributions [here](https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/tensorflow#distributed-training).\n", + "\n", + "`instance_type` specify the EC2 instance used for training. You should right-size your training instance based on the size of your data, algorithm and tasks. Here we choose [G4dn](https://aws.amazon.com/ec2/instance-types/g4/) instances, which feature NVIDIA T4 GPUs and custom Intel Cascade Lake CPUs, and are optimized for machine learning inference and small scale training. [Read more](https://aws.amazon.com/sagemaker/pricing/) on available instance types and pricing. \n", + "\n", + "`use_spot_instances`(Optional): For further cost optimization, you can leverage [managed Amazon EC2 Spot instances](https://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html) by setting this parameter to `True`. Managed spot training can optimize the cost of training models up to 90% over on-demand instances. SageMaker manages the Spot interruptions on your behalf. You can specify which training jobs use spot instances and a stopping condition that specifies how long Amazon SageMaker waits for a job to run using Amazon EC2 Spot instances. Full documentation [here](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-python-sdk/managed_spot_training_tensorflow_estimator/managed_spot_training_tensorflow_estimator.html). " ] }, { @@ -116,7 +122,7 @@ "mnist_estimator = TensorFlow(entry_point='mnist.py',\n", " role=role,\n", " instance_count=2,\n", - " instance_type='ml.p3.2xlarge',\n", + " instance_type='ml.g4dn.xlarge',\n", " framework_version='1.15.2',\n", " py_version='py3',\n", " distribution={'parameter_server': {'enabled': True}})" @@ -139,7 +145,7 @@ "mnist_estimator2 = TensorFlow(entry_point='mnist-2.py',\n", " role=role,\n", " instance_count=2,\n", - " instance_type='ml.p3.2xlarge',\n", + " instance_type='ml.g4dn.xlarge',\n", " framework_version='2.1.0',\n", " py_version='py3',\n", " distribution={'parameter_server': {'enabled': True}})" @@ -332,9 +338,9 @@ "metadata": { "instance_type": "ml.m5.large", "kernelspec": { - "display_name": "Python 3 (TensorFlow 2.1 Python 3.6 CPU Optimized)", + "display_name": "Python 3 (ipykernel)", "language": "python", - "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-west-2:236514542706:image/tensorflow-2.1-cpu-py36" + "name": "python3" }, "language_info": { "codemirror_mode": { @@ -346,7 +352,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.13" + "version": "3.10.1" } }, "nbformat": 4, diff --git a/scikit_bring_your_own.zip b/scikit_bring_your_own.zip index a4336f0..3012fef 100644 Binary files a/scikit_bring_your_own.zip and b/scikit_bring_your_own.zip differ