Skip to content

Commit 171a27f

Browse files
authored
Getting started refactoring (#197)
* State explicitly the flow order (Template as it is with Diabetes-> Bootstrap with your project code) * Fix “multistage pipeline structure” explanation * Clarify the R approach. It has only the model training. There is no evaluation/registration * Fix the confusion with AzureResourceConnection and WORKSPACE_SVC_CONNECTION * Explain using of a Docker image in the pipeline * Link to bring-your-own-code (Bryan’s) article * Fix broken links (e.g. diabetes_regression-ci-build-train.yml) * Provide ML Service connection screenshot * Explain explicit "diabetes" names. (e.g. The repo contains a sample “diabetes regression” project so here and there all names contain "diabetes") * Clarify the folder structure (Common folders (e.g. .pipelines, ml_service) vs Project folders (e.g. diabetes_regression))
1 parent b97140c commit 171a27f

File tree

6 files changed

+54
-32
lines changed

6 files changed

+54
-32
lines changed

bootstrap/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
11
# Bootstrap from MLOpsPython repository
22

3-
To use this existing project structure and scripts for your new ML project, you can quickly get started from the existing repository, bootstrap and create a template that works for your ML project. Bootstraping will prepare a similar directory structure for your project which includes renaming files and folders, deleting and cleaning up some directories and fixing imports and absolute path based on your project name. This will enable reusing various resources like pre-built pipelines and scripts for your new project.
3+
To use this existing project structure and scripts for your new ML project, you can quickly get started from the existing repository, bootstrap and create a template that works for your ML project. Bootstrapping will prepare a similar directory structure for your project which includes renaming files and folders, deleting and cleaning up some directories and fixing imports and absolute path based on your project name. This will enable reusing various resources like pre-built pipelines and scripts for your new project.
44

55
To bootstrap from the existing MLOpsPython repository clone this repository, ensure Python is installed locally, and run bootstrap.py script as below
66

77
`python bootstrap.py --d [dirpath] --n [projectname]`
88

99
Where `[dirpath]` is the absolute path to the root of your directory where MLOps repo is cloned and `[projectname]` is the name of your ML project.
1010

11+
The script renames folders, files and files' content from the base project name `diabetes` to your project name. However, you might need to manually rename variables defined in a variable group and their values.
12+
1113
[This article](https://docs.microsoft.com/azure/machine-learning/tutorial-convert-ml-experiment-to-production#use-your-own-model-with-mlopspython-code-template) will also assist to use this code template for your own ML project.

docs/code_description.md

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ High level directory structure for this repository:
2929
├── README.md <- The top-level README for developers using this project.
3030
```
3131

32+
The repository provides a template with folders structure suitable for maintaining multiple ML projects. There are common folders such as ***.pipelines***, ***environment_setup***, ***ml_service*** and folders containing the code base for each ML project. This repository contains a single sample ML project in the ***diabetes_regression*** folder. This folder is going to be automatically renamed to your project name if you follow the [bootstrap procedure](../bootstrap/README.md).
33+
3234
### Environment Setup
3335

3436
- `environment_setup/install_requirements.sh` : This script prepares a local conda environment i.e. install the Azure ML SDK and the packages specified in environment definitions.
@@ -44,6 +46,12 @@ High level directory structure for this repository:
4446
- `.pipelines/azdo-base-pipeline.yml` : a pipeline template used by ci-build-train pipeline and pr-build-train pipelines. It contains steps performing linting, data and unit testing.
4547
- `.pipelines/diabetes_regression-ci-build-train.yml` : a pipeline triggered when the code is merged into **master**. It performs linting, data integrity testing, unit testing, building and publishing an ML pipeline.
4648
- `.pipelines/azdo-pr-build-train.yml` : a pipeline triggered when a **pull request** to the **master** branch is created. It performs linting, data integrity testing and unit testing only.
49+
- `.pipelines/diabetes_regression-ci-image.yml` : a pipeline building a scoring image for the diabetes regression model.
50+
- `.pipelines/diabetes_regression-template-get-model-version.yml` : a pipeline template used by the `.pipelines/diabetes_regression-ci-build-train.yml` pipeline. It finds out if a new model was registered and retrieves a version of the new model.
51+
- `.pipelines/azdo-abtest-pipeline.yml` : a pipeline demonstrating [Canary deployment strategy](./docs/canary_ab_deployment.md).
52+
- `.pipelines/azdo-helm-*.yml` : pipeline templates used by the `.pipelines/azdo-abtest-pipeline.yml` pipeline.
53+
54+
4755

4856
### ML Services
4957

@@ -60,17 +68,26 @@ High level directory structure for this repository:
6068
- `diabetes_regression/conda_dependencies.yml` : Conda environment definition for the environment used for both training and scoring (Docker image in which train.py and score.py are run).
6169
- `diabetes_regression/ci_dependencies.yml` : Conda environment definition for the CI environment.
6270

63-
### Code
71+
### Training Step
6472

6573
- `diabetes_regression/training/train.py` : a training step of an ML training pipeline.
66-
- `diabetes_regression/evaluate/evaluate_model.py` : an evaluating step of an ML training pipeline which registers a new trained model if evaluation shows the new model is more performant than the previous one.
67-
- `diabetes_regression/evaluate/register_model.py` : (LEGACY) registers a new trained model if evaluation shows the new model is more performant than the previous one.
6874
- `diabetes_regression/training/R/r_train.r` : training a model with R basing on a sample dataset (weight_data.csv).
6975
- `diabetes_regression/training/R/train_with_r.py` : a python wrapper (ML Pipeline Step) invoking R training script on ML Compute
7076
- `diabetes_regression/training/R/train_with_r_on_databricks.py` : a python wrapper (ML Pipeline Step) invoking R training script on Databricks Compute
7177
- `diabetes_regression/training/R/weight_data.csv` : a sample dataset used by R script (r_train.r) to train a model
78+
- `diabetes_regression/training/R/test_train.py` : a unit test for the training script(s)
79+
80+
### Evaluation Step
81+
82+
- `diabetes_regression/evaluate/evaluate_model.py` : an evaluating step of an ML training pipeline which registers a new trained model if evaluation shows the new model is more performant than the previous one.
83+
84+
### Registering Step
85+
86+
- `diabetes_regression/evaluate/register_model.py` : registers a new trained model if evaluation shows the new model is more performant than the previous one.
7287

7388
### Scoring
7489

7590
- `diabetes_regression/scoring/score.py` : a scoring script which is about to be packed into a Docker Image along with a model while being deployed to QA/Prod environment.
76-
- `diabetes_regression/scoring/inference_config.yml`, deployment_config_aci.yml, deployment_config_aks.yml : configuration files for the [AML Model Deploy](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.private-vss-services-azureml&ssr=false#overview) pipeline task for ACI and AKS deployment targets.
91+
- `diabetes_regression/scoring/inference_config.yml`, `deployment_config_aci.yml`, `deployment_config_aks.yml` : configuration files for the [AML Model Deploy](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.private-vss-services-azureml&ssr=false#overview) pipeline task for ACI and AKS deployment targets.
92+
- `diabetes_regression/scoring/scoreA.py`, `diabetes_regression/scoring/scoreB.py` : simplified scoring files for the [Canary deployment sample](./docs/canary_ab_deployment.md).
93+

docs/getting_started.md

Lines changed: 28 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -13,27 +13,10 @@ If you already have an Azure DevOps organization, create a [new project](https:/
1313
* Fork this repository if there is a desire to contribute back to the repository else
1414
* Use this [code template](https://github.com/microsoft/MLOpsPython/generate) which copies the entire code base to your own GitHub location with the git commit history restarted. This can be used for learning and following the guide.
1515

16-
If the desire is to use this project for your machine learning code, follow the [bootstrap instructions](../bootstrap/README.md) after the code template is complete.
16+
This repository contains a template and demonstrates how to apply it to a sample ML project ***diabetes_regression*** that creates a linear regression model to predict the diabetes.
1717

18-
## Create an ARM Service Connection to deploy resources
18+
If the desire is to adopt this template for your project and to use it with your machine learning code, it is recommended to go through this guide as it is first. This ensures everything is working on your environment. After the sample is working, follow the [bootstrap instructions](../bootstrap/README.md) to convert the ***diabetes_regression*** sample into your project starting point.
1919

20-
This repository includes a YAML pipeline definition file for an Azure DevOps pipeline that will create the Azure ML workspace and associated resources through Azure Resource Manager.
21-
22-
The pipeline requires an **Azure Resource Manager**
23-
[service connection](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/service-endpoints?view=azure-devops&tabs=yaml#create-a-service-connection).
24-
Given this service connection, you will be able to run the IaC pipeline
25-
and have the required permissions to generate resources.
26-
27-
![create service connection](./images/create-rm-service-connection.png)
28-
29-
Use **``AzureResourceConnection``** as the connection name, since it is used
30-
in the IaC pipeline definition. Leave the **``Resource Group``** field empty.
31-
32-
**Note:** Creating the ARM service connection scope requires 'Owner' or 'User Access Administrator' permissions on the subscription.
33-
You must also have sufficient permissions to register an application with
34-
your Azure AD tenant, or receive the ID and secret of a service principal
35-
from your Azure AD Administrator. That principal must have 'Contributor'
36-
permissions on the subscription.
3720

3821
## Create a Variable Group for your Pipeline
3922

@@ -58,6 +41,7 @@ The variable group should contain the following required variables:
5841
| LOCATION | centralus |
5942
| RESOURCE_GROUP | mlops-RG |
6043
| WORKSPACE_NAME | mlops-AML-WS |
44+
| AZURE_RM_SVC_CONNECTION | azure-resource-connection|
6145
| WORKSPACE_SVC_CONNECTION | aml-workspace-connection |
6246
| ACI_DEPLOYMENT_NAME | diabetes-aci |
6347

@@ -75,6 +59,19 @@ the BASE_NAME value should not exceed 10 characters and it should contain number
7559

7660
The **RESOURCE_GROUP** parameter is used as the name for the resource group that will hold the Azure resources for the solution. If providing an existing AML Workspace, set this value to the corresponding resource group name.
7761

62+
The **AZURE_RM_SVC_CONNECTION** parameter is used by the [Azure DevOps pipeline]((../environment_setup/iac-create-environment.yml)) that creates the Azure ML workspace and associated resources through Azure Resource Manager. The pipeline requires an **Azure Resource Manager**
63+
[service connection](https://docs.microsoft.com/en-us/azure/devops/pipelines/library/service-endpoints?view=azure-devops&tabs=yaml#create-a-service-connection).
64+
65+
![create service connection](./images/create-rm-service-connection.png)
66+
67+
Leave the **``Resource Group``** field empty.
68+
69+
**Note:** Creating the ARM service connection scope requires 'Owner' or 'User Access Administrator' permissions on the subscription.
70+
You must also have sufficient permissions to register an application with
71+
your Azure AD tenant, or receive the ID and secret of a service principal
72+
from your Azure AD Administrator. That principal must have 'Contributor'
73+
permissions on the subscription.
74+
7875
The **WORKSPACE_SVC_CONNECTION** parameter is used to reference a service connection for the Azure ML workspace. You will create this after provisioning the workspace (we recommend using the IaC pipeline as described below), and installing the Azure ML extension in your Azure DevOps project.
7976

8077
Optionally, a **DATASET_NAME** parameter can be used to reference a training dataset that you have registered in your Azure ML workspace (more details below).
@@ -139,6 +136,8 @@ so that you can set up a service connection to your AML workspace.
139136

140137
Create a service connection to your ML workspace via the [Azure DevOps Azure ML task instructions](https://marketplace.visualstudio.com/items?itemName=ms-air-aiagility.vss-services-azureml) to be able to execute the Azure ML training pipeline. The connection name specified here needs to be used for the value of the `WORKSPACE_SVC_CONNECTION` set in the variable group above.
141138

139+
![created resources](./images/ml-ws-svc-connection.png)
140+
142141
**Note:** Creating service connection with Azure Machine Learning workspace scope requires 'Owner' or 'User Access Administrator' permissions on the Workspace.
143142
You must also have sufficient permissions to register an application with
144143
your Azure AD tenant, or receive the ID and secret of a service principal
@@ -152,15 +151,14 @@ you can set up the pipeline necessary for deploying your ML model
152151
to production. The pipeline has a sequence of stages for:
153152

154153
1. **Model Code Continuous Integration:** triggered on code change to master branch on GitHub,
155-
performs linting, unit testing, publishes a training pipeline, and runs the published training pipeline to train, evaluate, and register a model.
156-
1. **Train Model**: invokes the Azure ML service to trigger model training.
157-
1. **Release Deployment:** deploys a model to QA (ACI) and Prod (AKS)
158-
environments, or alternatively to Azure App Service.
154+
performs linting, unit testing and publishes a training pipeline.
155+
1. **Train Model**: invokes the Azure ML service to trigger the published training pipeline to train, evaluate, and register a model.
156+
1. **Release Deployment:** deploys a model to ACI, AKS and Azure App Service environments.
159157

160158
### Set up the Pipeline
161159

162160
In your [Azure DevOps](https://dev.azure.com) project create and run a new build
163-
pipeline referring to the [diabetes_regression-ci-build-train.yml](../.pipelines/azdo-ci-build-train.yml)
161+
pipeline referring to the [diabetes_regression-ci-build-train.yml](./.pipelines/azdo-ci-build-train.yml)
164162
pipeline definition in your forked repository:
165163

166164
![configure ci build pipeline](./images/ci-build-pipeline-configure.png)
@@ -175,6 +173,7 @@ and check out the published training pipeline in the **mlops-AML-WS** workspace
175173

176174
Great, you now have the build pipeline set up which automatically triggers every time there's a change in the master branch.
177175

176+
178177
* The first stage of the pipeline, **Model CI**, performs linting, unit testing, build and publishes an **ML Training Pipeline** in an **ML Workspace**.
179178

180179
**Note:** The build pipeline also supports building and publishing ML
@@ -188,14 +187,16 @@ with R on Azure ML Compute. You will also need to uncomment (i.e. include) the
188187
to train a model with R on Databricks. You will need
189188
to manually create a Databricks cluster and attach it to the ML Workspace as a
190189
compute (Values DB_CLUSTER_ID and DATABRICKS_COMPUTE_NAME variables should be
191-
specified).
190+
specified). Example ML pipelines using R have a single step to train a model. They don't demonstrate how to evaluate and register a model. The evaluation and registering techniques are shown only in the Python implementation.
192191

193192
* The second stage of the pipeline, **Train model**, triggers the run of the ML Training Pipeline. The training pipeline will train, evaluate, and register a new model. The actual computation is performed in an [Azure Machine Learning Compute cluster](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets#amlcompute). In Azure DevOps, this stage runs an agentless job that waits for the completion of the Azure ML job, allowing the pipeline to wait for training completion for hours or even days without using agent resources.
194193

195194
**Note:** If the model evaluation determines that the new model does not perform better than the previous one then the new model will not be registered and the pipeline will be cancelled.
196195

197196
* The third stage of the pipeline, **Deploy to ACI**, deploys the model to the QA environment in [Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/). It then runs a *smoke test* to validate the deployment, i.e. sends a sample query to the scoring web service and verifies that it returns a response in the expected format.
198197

198+
The pipeline uses a Docker container on the Azure Pipelines agents to accomplish the pipeline steps. The image of the container ***mcr.microsoft.com/mlops/python:latest*** is built with this [Dockerfile](./environment_setup/Dockerfile) and it has all necessary dependencies installed for the purposes of this repository. This image serves as an example of using a custom Docker image that provides a pre-baked environment. This environment is guaranteed to be the same on any building agent, VM or local machine. In your project you will want to build your own Docker image that only contains the dependencies and tools required for your use case. This image will be more likely smaller and therefore faster, and it will be totally maintained by your team.
199+
199200
Wait until the pipeline finishes and verify that there is a new model in the **ML Workspace**:
200201

201202
![trained model](./images/trained-model.png)
@@ -253,6 +254,8 @@ Make sure your webapp has the credentials to pull the image from the Azure Conta
253254

254255
# Next steps
255256

257+
* You may wish to follow the [bootstrap instructions](../bootstrap/README.md) to create a starting point for your project use case.
258+
* Use the [Convert ML experimental code to production code](https://docs.microsoft.com/azure/machine-learning/tutorial-convert-ml-experiment-to-production#use-your-own-model-with-mlopspython-code-template) tutorial which explains how to bring your machine learning code on top of this template.
256259
* The provided pipeline definition YAML file is a sample starting point, which you should tailor to your processes and environment.
257260
* You should edit the pipeline definition to remove unused stages. For example, if you are deploying to ACI and AKS, you should delete the unused `Deploy_Webapp` stage.
258261
* You may wish to enable [manual approvals](https://docs.microsoft.com/en-us/azure/devops/pipelines/process/approvals) before the deployment stages.

docs/images/ml-ws-svc-connection.png

60.1 KB
Loading

environment_setup/iac-create-environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ variables:
2323
steps:
2424
- task: AzureResourceGroupDeployment@2
2525
inputs:
26-
azureSubscription: 'AzureResourceConnection'
26+
azureSubscription: '$(AZURE_RM_SVC_CONNECTION)'
2727
action: 'Create Or Update Resource Group'
2828
resourceGroupName: '$(RESOURCE_GROUP)'
2929
location: $(LOCATION)

environment_setup/iac-remove-environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ variables:
1111
steps:
1212
- task: AzureResourceGroupDeployment@2
1313
inputs:
14-
azureSubscription: 'AzureResourceConnection'
14+
azureSubscription: '$(AZURE_RM_SVC_CONNECTION)'
1515
action: 'DeleteRG'
1616
resourceGroupName: '$(RESOURCE_GROUP)'
1717
location: $(LOCATION)

0 commit comments

Comments
 (0)