Today, we'll be discussing and developing an Azure DevOps pipeline wrapped around an Azure MLOps pipeline to proactively scale a Kubernetes cluster based on Machine Learning predictions.
At the end of the workshop, your Azure account will contain a working instance of this configuration for reference and experimentation.
We have included three Azure DevOps "challenges" over the course the workshop. We will update this documentation with solutions to these challenges as we go.
NOTE: This workshop is going to involve provisioning and configuring Azure resources such as ML Pipelines, Kubernetes Clusters, Azure Active Directory App Registrations, and Azure DevOps projects. If you already have a corporate Azure account, there's a good chance that you do not have permission to take these actions. If that's the case, we recommend that you sign up for a fresh Azure Free Account.
- Create a new Azure DevOps Free Account (if necessary)
- Navigate to https://azure.microsoft.com/en-us/free/
- Select the "Start free" button
- Create a new Azure DevOps Organization (if necessary) (docs)
- Navigate to https://azure.microsoft.com/en-us/pricing/details/devops/azure-devops-services/
- Select the Basic Plan column's "Start free >" button
- Create a new Azure DevOps Project (docs)
Azure DevOps Organization Screen -> + New project (top right)
- Clone the source repo into your project
- Create a new Service Principle in Azure Active Directory (docs)
Portal -> Azure Active Directory -> App registrations -> + New Registration
- Name:
atmsdevdayapp
- Supported account types: Accounts in this organizational directory only
- Redirect URI: leave empty
- Name:
- "Register" and record the following:
- Application (client) ID
- Directory (tenant) ID
- Create a Client Secret
- Capture Configuration Data in an Azure DevOps Variable Group
Azure DevOps Project -> Sidebar -> Pipelines -> Library -> Variable Groups -> "+ Variable group
-
Variable group name:
devopsforai-aml-vg
-
Add the following variables:
Variable Name Suggested Value AML_COMPUTE_CLUSTER_NAME train-cluster
BASE_NAME <your-initials>devday
EXPERIMENT_NAME mlopspython
LOCATION eastus
MODEL_NAME sklearn_regression_model.pkl
SOURCES_DIR_TRAIN python_scripts
SP_APP_ID <Application (client) ID from above>
SP_APP_SECRET <Client Secret Value from above>
SUBSCRIPTION_ID <Azure Subscription ID>
TENANT_ID <Directory (tenant) ID from above>
TRAIN_SCRIPT_PATH train.py
TRAINING_PIPELINE_NAME training-pipeline
-
- Create an Azure Resource Manager Service Connection (docs)
- Create a Build to Provision Azure Resources
Azure DevOps Project Sidebar -> Pipelines -> Builds -> New pipeline
- Where is your code?:
Azure Repos Git
- Select your imported repo
- Configure your pipeline:
Existing Azure Pipelines YAML file
- Branch:
master
- Path:
/build_pipeline_scripts/iac-create-environment.yml
- Branch:
- Suggested Name:
Provision Azure Environment
- Where is your code?:
- Run the Build
Newly Created Build Pipeline -> Run
(This will take a few minutes!)
- Verify Azure Resource Creation
Congratulations! You have configured the large majority of the Azure resources necessary for this workshop.
- Add your Service Principal as a Contributor on the ML Workspace (docs)
- Create a Blob Container and Load Log Data
Azure Portal -> Storage Accounts -> <Your Storage Account> -> Blob Service -> Containers > + Container
- Name:
modeldata
- Public access level:
Private (no anonymous access)
- Name:
- Select the newly created
modeldata
container- Download
log_data.pkl
- Upload (top left)
- File:
log_data.pkl
that you just downloaded
- File:
- Download
- Add Service Principal to Storage Account
Azure Portal -> Storage Accounts -> <Your Storage Account> -> Access control (IAM) -> Add Role Assignment
- Role:
Contributor
- Assign access to:
Azure AD user, group, or service principle
- Select:
<Your Service Principle name>
- Role:
- Capture Blob Storage Variable Group Entries
Azure DevOps Project -> Sidebar -> Pipelines -> Library -> Variable Groups -> devopsforai-aml-vg
-
Add the following variables:
Variable Name Suggested Value STORAGE_ACCT_NAME <Blob Storage Container Name>
STORAGE_ACCT_KEY <Azure Portal -> Storage Accounts -> <Your Storage Account> -> Settings -> Access Keys>
STORAGE_BLOB_NAME modeldata
-
- Create an Azure DevOps Build to Create the ML Pipeline
Azure DevOps Project Sidebar -> Pipelines -> Builds -> New pipeline
- Where is your code?:
Azure Repos Git
- Select your imported repo
- Configure your pipeline:
Existing Azure Pipelines YAML file
- Branch:
master
- Path:
/build_pipeline_scripts/model-build.yml
- Branch:
- Suggested Name:
Build ML Pipeline
- Where is your code?:
- Run the Build
Congratulations! You have created an Azure ML Pipeline. We will train the pipeline in the next section.
- Create a New Azure DevOps Release
Azure DevOps Project Sidebar -> Pipelines -> Releases -> New pipeline
- Start With:
Empty Job
- Stage Name:
Run Train Scripts
- Stage Name:
- Start With:
- Update name to
Train ML Pipeline
- Add an Artifact
- Link the Variable Group
Train ML Pipeline Release -> Variables Tab -> Variable groups
- Link Variable Group:
devopsforai-aml-vg
- Update to Ubuntu Agent
- Navigate into Tasks for the Release
- Select the "Agent Job"
- Update Agent Specification:
ubuntu-16.04
- Add Command Line Task
- Click the "+" (Add task to an Agent Job) button in the
Agent Job
item - Add a
Command line
task - Select the new
Command Line Script
task and set the following- Display Name:
Run Train Models Script
- Script:
docker run -v $(System.DefaultWorkingDirectory)/_model-build/mlops-pipelines/python_scripts/:/script \ -w=/script -e MODEL_NAME=$MODEL_NAME -e EXPERIMENT_NAME=$EXPERIMENT_NAME \ -e TENANT_ID=$TENANT_ID -e SP_APP_ID=$SP_APP_ID -e SP_APP_SECRET=$SP_APP_SECRET \ -e SUBSCRIPTION_ID=$SUBSCRIPTION_ID -e RELEASE_RELEASEID=$RELEASE_RELEASEID \ -e BUILD_BUILDID=$BUILD_BUILDID -e BASE_NAME=$BASE_NAME \ -e STORAGE_ACCT_NAME=$STORAGE_ACCT_NAME -e STORAGE_ACCT_KEY=$STORAGE_ACCT_KEY -e STORAGE_BLOB_NAME=$STORAGE_BLOB_NAME \ mcr.microsoft.com/mlops/python:latest python run_train_pipeline.py
- Display Name:
- Click the "+" (Add task to an Agent Job) button in the
- Run the Release
Congratulations! You've trained your models. We will create an AKS Cluster in the next section.
- Navigate to the Azure Cloud Shell at https://shell.azure.com
- Use the following script to create an AKS Cluster (docs)
az login # Not required in Azure Cloud Shell # If you've already run this script, you'll need to remove cached service principle info in Azure # rm .azure/aksServicePrincipal.json az group create --name atDevDayWorkshopRG --location eastus az provider register --namespace Microsoft.Network az provider register --namespace Microsoft.Compute az provider register --namespace Microsoft.Storage az ad sp create-for-rbac --skip-assignment # `aks create` will take a while! # substitute values from `create-for-rbac` above! az aks create --resource-group atDevDayWorkshopRG \ --name atDevDayCluster \ --service-principal <appId from create-for-rbac> \ --client-secret <password from create-for-rbac> \ --node-count 1 \ --vm-set-type VirtualMachineScaleSets \ --enable-cluster-autoscaler \ --generate-ssh-keys \ --node-vm-size Standard_D2_v3 \ --min-count 1 \ --max-count 2 # disable auto-scaling so we can proactively scale! az aks update --resource-group atDevDayWorkshopRG --name atDevDayCluster --disable-cluster-autoscaler
- If you get Service Principle errors, review the following article: Service principals with Azure Kubernetes Service (AKS)
- Verify that AKS Cluster has been created correctly
Congratulations! You have create an AKS Cluster. We will use the ML models to proactively scale the cluster in the next section.
-
Capture AKS Variable Group Entries
Azure DevOps Project -> Sidebar -> Pipelines -> Library -> Variable Groups -> devopsforai-aml-vg
-
Add the following variables:
Variable Name Suggested Value AKS_NAME atDevDayCluster
AKS_RG atDevDayWorkshopRG
-
-
Add the Service Principal to the AKS Cluster
Azure Portal -> Kubernetes services -> atDevDayCluster -> Access Control (IAM) -> Add role assignment
- Role:
Contributor
- Assign access to:
Azure AD user, group, or service principle
- Select:
<Your Service Principle name>
- Role:
-
Create a new Azure DevOps Release to run the ML Scaler
Azure DevOps Project Sidebar -> Pipelines -> Releases -> Train ML Pipeline -> ... in top right -> Clone
- Name:
Run ML Scaler
- Update the
Command Line Script
task to look like the following:- Display Name:
Run ML Scaler Script
- Script:
docker run -v $(System.DefaultWorkingDirectory)/_model-build/mlops-pipelines/python_scripts/:/script \ -w=/script -e SP_APP_ID=$SP_APP_ID -e SP_APP_SECRET=$SP_APP_SECRET -e TENANT_ID=$TENANT_ID \ -e AKS_RG=$AKS_RG -e STORAGE_ACCT_NAME=$STORAGE_ACCT_NAME -e STORAGE_ACCT_KEY=$STORAGE_ACCT_KEY \ -e STORAGE_BLOB_NAME=$STORAGE_BLOB_NAME -e CONTANER_NAME=$CONTANER_NAME -e AKS_NAME=$AKS_NAME \ markschabacker/at_ml_dev_day:latest python AksResourceController.py
- The docker image definition used in the script is available in the
docker/container
folder in this repo.
- Display Name:
- Name:
-
Save and run the
Run ML Scaler
release- This should take a while (5+ minutes)!
-
Verify Scaling
Congratulations! You've used Machine Learning to proactively scale an Azure Kubernetes Service cluster!