You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/custom_model.md
+29Lines changed: 29 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -10,6 +10,7 @@ This document provides steps to follow when using this repository as a template
10
10
1.[Optional] Update the evaluation code
11
11
1. Customize the build agent environment
12
12
1.[If appropriate] Replace the score code
13
+
1.[If appropriate] Configure batch scoring data
13
14
14
15
## Follow the Getting Started guide
15
16
@@ -35,6 +36,8 @@ To bootstrap from the existing MLOpsPython repository:
35
36
* `[dirpath]` is the absolute path to the root of the directory where MLOpsPython is cloned
36
37
* `[projectname]` is the name of your ML project
37
38
39
+
# Configure Custom Training
40
+
38
41
## Configure training data
39
42
40
43
The training ML pipeline uses a [sample diabetes dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) as training data.
@@ -83,6 +86,8 @@ The DevOps pipeline definitions in the MLOpsPython template run several steps in
83
86
* Create a new Docker image containing your dependencies. See [docs/custom_container.md](custom_container.md). Recommended if you have a larger number of dependencies, or if the overhead of installing additional dependencies on each run is too high.
84
87
* Remove the container references from the pipeline definition files and run the pipelines on self hosted agents with dependencies pre-installed.
85
88
89
+
# Configure Custom Scoring
90
+
86
91
## Replace score code
87
92
88
93
For the model to provide real-time inference capabilities, the score code needs to be replaced. The MLOpsPython template uses the score code to deploy the model to do real-time scoring on ACI, AKS, or Web apps.
@@ -92,3 +97,27 @@ If you want to keep scoring:
92
97
1. Update or replace `[project name]/scoring/score.py`
93
98
1. Add any dependencies required by scoring to `[project name]/conda_dependencies.yml`
94
99
1. Modify the test cases in the `ml_service/util/smoke_test_scoring_service.py` script to match the schema of the training features in your data
100
+
101
+
# Configure Custom Batch Scoring
102
+
103
+
## Configure input and output data
104
+
105
+
The batch scoring pipeline is configured to use the default datastore for input and output. It will use sample data for scoring.
106
+
107
+
In order to configure your own input datastore and output datastores, you will need to specify an Azure Blob Storage Account and set up input and output containers.
108
+
109
+
Configure the variables below in your variable group.
110
+
111
+
**Note: The datastore storage resource, input/output containers, and scoring data is not created automatically. Make sure that you have manually provisioned these resources and placed your scoring data in your input container with the proper name.**
112
+
113
+
114
+
| Variable Name | Suggested Value | Short description |
| SCORING_DATASTORE_ACCESS_KEY ||[Azure Storage Account Key](https://docs.microsoft.com/en-us/rest/api/storageservices/authorize-requests-to-azure-storage). You may want to consider linking this variable to Azure KeyVault to avoid storing the access key in plain text. |
118
+
| SCORING_DATASTORE_INPUT_CONTAINER || The name of the container for input data. Defaults to `input` if not set. |
119
+
| SCORING_DATASTORE_OUTPUT_CONTAINER|| The name of the container for output data. Defaults to `output` if not set. |
120
+
| SCORING_DATASTORE_INPUT_FILENAME || The filename of the input data in your container Defaults to `diabetes_scoring_input.csv` if not set. |
121
+
| SCORING_DATASET_NAME || The AzureML Dataset name to use. Defaults to `diabetes_scoring_ds` if not set (optional). |
122
+
| SCORING_DATASTORE_OUTPUT_FILENAME || The filename to use for the output data. The pipeline will create this file. Defaults to `diabetes_scoring_output.csv` if not set (optional). |
Copy file name to clipboardExpand all lines: docs/getting_started.md
+4-10Lines changed: 4 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -64,9 +64,8 @@ The variable group should contain the following required variables. **Azure reso
64
64
| WORKSPACE_NAME | mlops-AML-WS | Azure ML Workspace name |
65
65
| AZURE_RM_SVC_CONNECTION | azure-resource-connection |[Azure Resource Manager Service Connection](#create-an-azure-devops-service-connection-for-the-azure-resource-manager) name |
66
66
| WORKSPACE_SVC_CONNECTION | aml-workspace-connection |[Azure ML Workspace Service Connection](#create-an-azure-devops-azure-ml-workspace-service-connection) name |
67
-
| ACI_DEPLOYMENT_NAME | mlops-aci |[Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/) name |
| ACI_DEPLOYMENT_NAME | mlops-aci |[Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/) name ||
68
+
70
69
71
70
Make sure you select the **Allow access to all pipelines** checkbox in the variable group configuration.
72
71
@@ -88,10 +87,6 @@ More variables are available for further tweaking, but the above variables are a
88
87
89
88
**ACI_DEPLOYMENT_NAME** is used for naming the scoring service during deployment to [Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/).
90
89
91
-
**SCORING_DATASTORE_STORAGE_NAME** is the name for an Azure Blob Storage account that will contain both data used as input to batch scoring, as well as the batch scoring outputs. This variable is optional and only needed if you intend to use the batch scoring facility. Note that since this resource is optional, the resource provisioning pipelines mentioned below do not create this resource automatically, and manual creation is required before use.
92
-
93
-
**SCORING_DATASTORE_ACCESS_KEY** is the access key for the scoring data Azure storage account mentioned above. You may want to consider linking this variable to Azure KeyVault to avoid storing the access key in plain text. This variable is optional and only needed if you intend to use the batch scoring facility.
94
-
95
90
96
91
## Provisioning resources using Azure Pipelines
97
92
@@ -295,11 +290,10 @@ The pipeline stages are summarized below:
295
290
- If run locally without the model version, the batch scoring pipeline will use the model's latest version.
296
291
- Trigger the *ML Batch Scoring Pipeline* and waits for it to complete.
297
292
- This is an **agentless** job. The CI pipeline can wait for ML pipeline completion for hours or even days without using agent resources.
298
-
- Use the scoring input data supplied via the SCORING_DATASTORE_INPUT_* configuration variables.
293
+
- Use the scoring input data supplied via the SCORING_DATASTORE_INPUT_* configuration variables, or uses the default datastore and sample data.
299
294
- Once scoring is completed, the scores are made available in the same blob storage at the locations specified via the SCORING_DATASTORE_OUTPUT_* configuration variables.
300
295
301
-
**Note** In the event a scoring data store is not yet configured, you can still try out batch scoring by supplying a scoring input data file within the data folder. Do make sure to set the SCORING_DATASTORE_INPUT_FILENAME variable to the name of the file. This approach will cause the score output to be written to the ML workspace's default datastore.
302
-
296
+
To configure your own custom scoring data, see [Configure Custom Batch Scoring](custom_model.md#Configure-Custom-Batch-Scoring).
0 commit comments