You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* State explicitly the flow order (Template as it is with Diabetes-> Bootstrap with your project code)
* Fix “multistage pipeline structure” explanation
* Clarify the R approach. It has only the model training. There is no evaluation/registration
* Fix the confusion with AzureResourceConnection and WORKSPACE_SVC_CONNECTION
* Explain using of a Docker image in the pipeline
* Link to bring-your-own-code (Bryan’s) article
* Fix broken links (e.g. diabetes_regression-ci-build-train.yml)
* Provide ML Service connection screenshot
* Explain explicit "diabetes" names. (e.g. The repo contains a sample “diabetes regression” project so here and there all names contain "diabetes")
* Clarify the folder structure (Common folders (e.g. .pipelines, ml_service) vs Project folders (e.g. diabetes_regression))
To use this existing project structure and scripts for your new ML project, you can quickly get started from the existing repository, bootstrap and create a template that works for your ML project. Bootstraping will prepare a similar directory structure for your project which includes renaming files and folders, deleting and cleaning up some directories and fixing imports and absolute path based on your project name. This will enable reusing various resources like pre-built pipelines and scripts for your new project.
3
+
To use this existing project structure and scripts for your new ML project, you can quickly get started from the existing repository, bootstrap and create a template that works for your ML project. Bootstrapping will prepare a similar directory structure for your project which includes renaming files and folders, deleting and cleaning up some directories and fixing imports and absolute path based on your project name. This will enable reusing various resources like pre-built pipelines and scripts for your new project.
4
4
5
5
To bootstrap from the existing MLOpsPython repository clone this repository, ensure Python is installed locally, and run bootstrap.py script as below
Where `[dirpath]` is the absolute path to the root of your directory where MLOps repo is cloned and `[projectname]` is the name of your ML project.
10
10
11
+
The script renames folders, files and files' content from the base project name `diabetes` to your project name. However, you might need to manually rename variables defined in a variable group and their values.
12
+
11
13
[This article](https://docs.microsoft.com/azure/machine-learning/tutorial-convert-ml-experiment-to-production#use-your-own-model-with-mlopspython-code-template) will also assist to use this code template for your own ML project.
Copy file name to clipboardExpand all lines: docs/code_description.md
+21-4Lines changed: 21 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -29,6 +29,8 @@ High level directory structure for this repository:
29
29
├── README.md <- The top-level README for developers using this project.
30
30
```
31
31
32
+
The repository provides a template with folders structure suitable for maintaining multiple ML projects. There are common folders such as ***.pipelines***, ***environment_setup***, ***ml_service*** and folders containing the code base for each ML project. This repository contains a single sample ML project in the ***diabetes_regression*** folder. This folder is going to be automatically renamed to your project name if you follow the [bootstrap procedure](../bootstrap/README.md).
33
+
32
34
### Environment Setup
33
35
34
36
-`environment_setup/install_requirements.sh` : This script prepares a local conda environment i.e. install the Azure ML SDK and the packages specified in environment definitions.
@@ -44,6 +46,12 @@ High level directory structure for this repository:
44
46
-`.pipelines/azdo-base-pipeline.yml` : a pipeline template used by ci-build-train pipeline and pr-build-train pipelines. It contains steps performing linting, data and unit testing.
45
47
-`.pipelines/diabetes_regression-ci-build-train.yml` : a pipeline triggered when the code is merged into **master**. It performs linting, data integrity testing, unit testing, building and publishing an ML pipeline.
46
48
-`.pipelines/azdo-pr-build-train.yml` : a pipeline triggered when a **pull request** to the **master** branch is created. It performs linting, data integrity testing and unit testing only.
49
+
-`.pipelines/diabetes_regression-ci-image.yml` : a pipeline building a scoring image for the diabetes regression model.
50
+
-`.pipelines/diabetes_regression-template-get-model-version.yml` : a pipeline template used by the `.pipelines/diabetes_regression-ci-build-train.yml` pipeline. It finds out if a new model was registered and retrieves a version of the new model.
51
+
-`.pipelines/azdo-abtest-pipeline.yml` : a pipeline demonstrating [Canary deployment strategy](./docs/canary_ab_deployment.md).
52
+
-`.pipelines/azdo-helm-*.yml` : pipeline templates used by the `.pipelines/azdo-abtest-pipeline.yml` pipeline.
53
+
54
+
47
55
48
56
### ML Services
49
57
@@ -60,17 +68,26 @@ High level directory structure for this repository:
60
68
-`diabetes_regression/conda_dependencies.yml` : Conda environment definition for the environment used for both training and scoring (Docker image in which train.py and score.py are run).
61
69
-`diabetes_regression/ci_dependencies.yml` : Conda environment definition for the CI environment.
62
70
63
-
### Code
71
+
### Training Step
64
72
65
73
-`diabetes_regression/training/train.py` : a training step of an ML training pipeline.
66
-
-`diabetes_regression/evaluate/evaluate_model.py` : an evaluating step of an ML training pipeline which registers a new trained model if evaluation shows the new model is more performant than the previous one.
67
-
-`diabetes_regression/evaluate/register_model.py` : (LEGACY) registers a new trained model if evaluation shows the new model is more performant than the previous one.
68
74
-`diabetes_regression/training/R/r_train.r` : training a model with R basing on a sample dataset (weight_data.csv).
69
75
-`diabetes_regression/training/R/train_with_r.py` : a python wrapper (ML Pipeline Step) invoking R training script on ML Compute
70
76
-`diabetes_regression/training/R/train_with_r_on_databricks.py` : a python wrapper (ML Pipeline Step) invoking R training script on Databricks Compute
71
77
-`diabetes_regression/training/R/weight_data.csv` : a sample dataset used by R script (r_train.r) to train a model
78
+
-`diabetes_regression/training/R/test_train.py` : a unit test for the training script(s)
79
+
80
+
### Evaluation Step
81
+
82
+
-`diabetes_regression/evaluate/evaluate_model.py` : an evaluating step of an ML training pipeline which registers a new trained model if evaluation shows the new model is more performant than the previous one.
83
+
84
+
### Registering Step
85
+
86
+
-`diabetes_regression/evaluate/register_model.py` : registers a new trained model if evaluation shows the new model is more performant than the previous one.
72
87
73
88
### Scoring
74
89
75
90
-`diabetes_regression/scoring/score.py` : a scoring script which is about to be packed into a Docker Image along with a model while being deployed to QA/Prod environment.
76
-
-`diabetes_regression/scoring/inference_config.yml`, deployment_config_aci.yml, deployment_config_aks.yml : configuration files for the [AML Model Deploy](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.private-vss-services-azureml&ssr=false#overview) pipeline task for ACI and AKS deployment targets.
91
+
-`diabetes_regression/scoring/inference_config.yml`, `deployment_config_aci.yml`, `deployment_config_aks.yml` : configuration files for the [AML Model Deploy](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.private-vss-services-azureml&ssr=false#overview) pipeline task for ACI and AKS deployment targets.
92
+
-`diabetes_regression/scoring/scoreA.py`, `diabetes_regression/scoring/scoreB.py` : simplified scoring files for the [Canary deployment sample](./docs/canary_ab_deployment.md).
Copy file name to clipboardExpand all lines: docs/getting_started.md
+28-25Lines changed: 28 additions & 25 deletions
Original file line number
Diff line number
Diff line change
@@ -13,27 +13,10 @@ If you already have an Azure DevOps organization, create a [new project](https:/
13
13
* Fork this repository if there is a desire to contribute back to the repository else
14
14
* Use this [code template](https://github.com/microsoft/MLOpsPython/generate) which copies the entire code base to your own GitHub location with the git commit history restarted. This can be used for learning and following the guide.
15
15
16
-
If the desire is to use this project for your machine learning code, follow the [bootstrap instructions](../bootstrap/README.md) after the code template is complete.
16
+
This repository contains a template and demonstrates how to apply it to a sample ML project ***diabetes_regression*** that creates a linear regression model to predict the diabetes.
17
17
18
-
## Create an ARM Service Connection to deploy resources
18
+
If the desire is to adopt this template for your project and to use it with your machine learning code, it is recommended to go through this guide as it is first. This ensures everything is working on your environment. After the sample is working, follow the [bootstrap instructions](../bootstrap/README.md) to convert the ***diabetes_regression*** sample into your project starting point.
19
19
20
-
This repository includes a YAML pipeline definition file for an Azure DevOps pipeline that will create the Azure ML workspace and associated resources through Azure Resource Manager.
21
-
22
-
The pipeline requires an **Azure Resource Manager**
@@ -75,6 +59,19 @@ the BASE_NAME value should not exceed 10 characters and it should contain number
75
59
76
60
The **RESOURCE_GROUP** parameter is used as the name for the resource group that will hold the Azure resources for the solution. If providing an existing AML Workspace, set this value to the corresponding resource group name.
77
61
62
+
The **AZURE_RM_SVC_CONNECTION** parameter is used by the [Azure DevOps pipeline]((../environment_setup/iac-create-environment.yml)) that creates the Azure ML workspace and associated resources through Azure Resource Manager. The pipeline requires an **Azure Resource Manager**

66
+
67
+
Leave the **``Resource Group``** field empty.
68
+
69
+
**Note:** Creating the ARM service connection scope requires 'Owner' or 'User Access Administrator' permissions on the subscription.
70
+
You must also have sufficient permissions to register an application with
71
+
your Azure AD tenant, or receive the ID and secret of a service principal
72
+
from your Azure AD Administrator. That principal must have 'Contributor'
73
+
permissions on the subscription.
74
+
78
75
The **WORKSPACE_SVC_CONNECTION** parameter is used to reference a service connection for the Azure ML workspace. You will create this after provisioning the workspace (we recommend using the IaC pipeline as described below), and installing the Azure ML extension in your Azure DevOps project.
79
76
80
77
Optionally, a **DATASET_NAME** parameter can be used to reference a training dataset that you have registered in your Azure ML workspace (more details below).
@@ -139,6 +136,8 @@ so that you can set up a service connection to your AML workspace.
139
136
140
137
Create a service connection to your ML workspace via the [Azure DevOps Azure ML task instructions](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.vss-services-azureml) to be able to execute the Azure ML training pipeline. The connection name specified here needs to be used for the value of the `WORKSPACE_SVC_CONNECTION` set in the variable group above.
**Note:** Creating service connection with Azure Machine Learning workspace scope requires 'Owner' or 'User Access Administrator' permissions on the Workspace.
143
142
You must also have sufficient permissions to register an application with
144
143
your Azure AD tenant, or receive the ID and secret of a service principal
@@ -152,15 +151,14 @@ you can set up the pipeline necessary for deploying your ML model
152
151
to production. The pipeline has a sequence of stages for:
153
152
154
153
1.**Model Code Continuous Integration:** triggered on code change to master branch on GitHub,
155
-
performs linting, unit testing, publishes a training pipeline, and runs the published training pipeline to train, evaluate, and register a model.
156
-
1.**Train Model**: invokes the Azure ML service to trigger model training.
157
-
1.**Release Deployment:** deploys a model to QA (ACI) and Prod (AKS)
158
-
environments, or alternatively to Azure App Service.
154
+
performs linting, unit testing and publishes a training pipeline.
155
+
1.**Train Model**: invokes the Azure ML service to trigger the published training pipeline to train, evaluate, and register a model.
156
+
1.**Release Deployment:** deploys a model to ACI, AKS and Azure App Service environments.
159
157
160
158
### Set up the Pipeline
161
159
162
160
In your [Azure DevOps](https://dev.azure.com) project create and run a new build
163
-
pipeline referring to the [diabetes_regression-ci-build-train.yml](../.pipelines/azdo-ci-build-train.yml)
161
+
pipeline referring to the [diabetes_regression-ci-build-train.yml](./.pipelines/azdo-ci-build-train.yml)
164
162
pipeline definition in your forked repository:
165
163
166
164

@@ -175,6 +173,7 @@ and check out the published training pipeline in the **mlops-AML-WS** workspace
175
173
176
174
Great, you now have the build pipeline set up which automatically triggers every time there's a change in the master branch.
177
175
176
+
178
177
* The first stage of the pipeline, **Model CI**, performs linting, unit testing, build and publishes an **ML Training Pipeline** in an **ML Workspace**.
179
178
180
179
**Note:** The build pipeline also supports building and publishing ML
@@ -188,14 +187,16 @@ with R on Azure ML Compute. You will also need to uncomment (i.e. include) the
188
187
to train a model with R on Databricks. You will need
189
188
to manually create a Databricks cluster and attach it to the ML Workspace as a
190
189
compute (Values DB_CLUSTER_ID and DATABRICKS_COMPUTE_NAME variables should be
191
-
specified).
190
+
specified). Example ML pipelines using R have a single step to train a model. They don't demonstrate how to evaluate and register a model. The evaluation and registering techniques are shown only in the Python implementation.
192
191
193
192
* The second stage of the pipeline, **Train model**, triggers the run of the ML Training Pipeline. The training pipeline will train, evaluate, and register a new model. The actual computation is performed in an [Azure Machine Learning Compute cluster](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute). In Azure DevOps, this stage runs an agentless job that waits for the completion of the Azure ML job, allowing the pipeline to wait for training completion for hours or even days without using agent resources.
194
193
195
194
**Note:** If the model evaluation determines that the new model does not perform better than the previous one then the new model will not be registered and the pipeline will be cancelled.
196
195
197
196
* The third stage of the pipeline, **Deploy to ACI**, deploys the model to the QA environment in [Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/). It then runs a *smoke test* to validate the deployment, i.e. sends a sample query to the scoring web service and verifies that it returns a response in the expected format.
198
197
198
+
The pipeline uses a Docker container on the Azure Pipelines agents to accomplish the pipeline steps. The image of the container ***mcr.microsoft.com/mlops/python:latest*** is built with this [Dockerfile](./environment_setup/Dockerfile) and it has all necessary dependencies installed for the purposes of this repository. This image serves as an example of using a custom Docker image that provides a pre-baked environment. This environment is guaranteed to be the same on any building agent, VM or local machine. In your project you will want to build your own Docker image that only contains the dependencies and tools required for your use case. This image will be more likely smaller and therefore faster, and it will be totally maintained by your team.
199
+
199
200
Wait until the pipeline finishes and verify that there is a new model in the **ML Workspace**:
200
201
201
202

@@ -253,6 +254,8 @@ Make sure your webapp has the credentials to pull the image from the Azure Conta
253
254
254
255
# Next steps
255
256
257
+
* You may wish to follow the [bootstrap instructions](../bootstrap/README.md) to create a starting point for your project use case.
258
+
* Use the [Convert ML experimental code to production code](https://docs.microsoft.com/azure/machine-learning/tutorial-convert-ml-experiment-to-production#use-your-own-model-with-mlopspython-code-template) tutorial which explains how to bring your machine learning code on top of this template.
256
259
* The provided pipeline definition YAML file is a sample starting point, which you should tailor to your processes and environment.
257
260
* You should edit the pipeline definition to remove unused stages. For example, if you are deploying to ACI and AKS, you should delete the unused `Deploy_Webapp` stage.
258
261
* You may wish to enable [manual approvals](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/approvals) before the deployment stages.
0 commit comments