Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 7488530

Browse files
authoredSep 23, 2020
Fix Batch Scoring docs (#333)
* docs * more fixes
1 parent bf34623 commit 7488530

File tree

3 files changed

+34
-11
lines changed

3 files changed

+34
-11
lines changed
 

‎data/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
This folder is used for example data, and it is not meant to be used for storing training data.
22

3-
Follow steps to [Configure Training Data]('docs/custom_model.md#configure-training-data.md') to use your own data for training.
3+
Follow steps to [Configure Training Data](../docs/custom_model.md#Configure-Custom-Training) to use your own data for training.

‎docs/custom_model.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ This document provides steps to follow when using this repository as a template
1010
1. [Optional] Update the evaluation code
1111
1. Customize the build agent environment
1212
1. [If appropriate] Replace the score code
13+
1. [If appropriate] Configure batch scoring data
1314

1415
## Follow the Getting Started guide
1516

@@ -35,6 +36,8 @@ To bootstrap from the existing MLOpsPython repository:
3536
* `[dirpath]` is the absolute path to the root of the directory where MLOpsPython is cloned
3637
* `[projectname]` is the name of your ML project
3738

39+
# Configure Custom Training
40+
3841
## Configure training data
3942

4043
The training ML pipeline uses a [sample diabetes dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html) as training data.
@@ -83,6 +86,8 @@ The DevOps pipeline definitions in the MLOpsPython template run several steps in
8386
* Create a new Docker image containing your dependencies. See [docs/custom_container.md](custom_container.md). Recommended if you have a larger number of dependencies, or if the overhead of installing additional dependencies on each run is too high.
8487
* Remove the container references from the pipeline definition files and run the pipelines on self hosted agents with dependencies pre-installed.
8588

89+
# Configure Custom Scoring
90+
8691
## Replace score code
8792

8893
For the model to provide real-time inference capabilities, the score code needs to be replaced. The MLOpsPython template uses the score code to deploy the model to do real-time scoring on ACI, AKS, or Web apps.
@@ -92,3 +97,27 @@ If you want to keep scoring:
9297
1. Update or replace `[project name]/scoring/score.py`
9398
1. Add any dependencies required by scoring to `[project name]/conda_dependencies.yml`
9499
1. Modify the test cases in the `ml_service/util/smoke_test_scoring_service.py` script to match the schema of the training features in your data
100+
101+
# Configure Custom Batch Scoring
102+
103+
## Configure input and output data
104+
105+
The batch scoring pipeline is configured to use the default datastore for input and output. It will use sample data for scoring.
106+
107+
In order to configure your own input datastore and output datastores, you will need to specify an Azure Blob Storage Account and set up input and output containers.
108+
109+
Configure the variables below in your variable group.
110+
111+
**Note: The datastore storage resource, input/output containers, and scoring data is not created automatically. Make sure that you have manually provisioned these resources and placed your scoring data in your input container with the proper name.**
112+
113+
114+
| Variable Name | Suggested Value | Short description |
115+
| ------------------------ | ------------------------- | --------------------------------------------------------------------------------------------------------------------------- |
116+
| SCORING_DATASTORE_STORAGE_NAME | | [Azure Blob Storage Account](https://docs.microsoft.com/en-us/azure/storage/blobs/) name. |
117+
| SCORING_DATASTORE_ACCESS_KEY | | [Azure Storage Account Key](https://docs.microsoft.com/en-us/rest/api/storageservices/authorize-requests-to-azure-storage). You may want to consider linking this variable to Azure KeyVault to avoid storing the access key in plain text. |
118+
| SCORING_DATASTORE_INPUT_CONTAINER | | The name of the container for input data. Defaults to `input` if not set. |
119+
| SCORING_DATASTORE_OUTPUT_CONTAINER| | The name of the container for output data. Defaults to `output` if not set. |
120+
| SCORING_DATASTORE_INPUT_FILENAME | | The filename of the input data in your container Defaults to `diabetes_scoring_input.csv` if not set. |
121+
| SCORING_DATASET_NAME | | The AzureML Dataset name to use. Defaults to `diabetes_scoring_ds` if not set (optional). |
122+
| SCORING_DATASTORE_OUTPUT_FILENAME | | The filename to use for the output data. The pipeline will create this file. Defaults to `diabetes_scoring_output.csv` if not set (optional). |
123+

‎docs/getting_started.md

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -64,9 +64,8 @@ The variable group should contain the following required variables. **Azure reso
6464
| WORKSPACE_NAME | mlops-AML-WS | Azure ML Workspace name |
6565
| AZURE_RM_SVC_CONNECTION | azure-resource-connection | [Azure Resource Manager Service Connection](#create-an-azure-devops-service-connection-for-the-azure-resource-manager) name |
6666
| WORKSPACE_SVC_CONNECTION | aml-workspace-connection | [Azure ML Workspace Service Connection](#create-an-azure-devops-azure-ml-workspace-service-connection) name |
67-
| ACI_DEPLOYMENT_NAME | mlops-aci | [Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/) name |
68-
| SCORING_DATASTORE_STORAGE_NAME | [your project name]scoredata | [Azure Blob Storage Account](https://docs.microsoft.com/en-us/azure/storage/blobs/) name (optional) |
69-
| SCORING_DATASTORE_ACCESS_KEY | | [Azure Storage Account Key](https://docs.microsoft.com/en-us/rest/api/storageservices/authorize-requests-to-azure-storage) (optional) |
67+
| ACI_DEPLOYMENT_NAME | mlops-aci | [Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/) name | |
68+
7069

7170
Make sure you select the **Allow access to all pipelines** checkbox in the variable group configuration.
7271

@@ -88,10 +87,6 @@ More variables are available for further tweaking, but the above variables are a
8887

8988
**ACI_DEPLOYMENT_NAME** is used for naming the scoring service during deployment to [Azure Container Instances](https://azure.microsoft.com/en-us/services/container-instances/).
9089

91-
**SCORING_DATASTORE_STORAGE_NAME** is the name for an Azure Blob Storage account that will contain both data used as input to batch scoring, as well as the batch scoring outputs. This variable is optional and only needed if you intend to use the batch scoring facility. Note that since this resource is optional, the resource provisioning pipelines mentioned below do not create this resource automatically, and manual creation is required before use.
92-
93-
**SCORING_DATASTORE_ACCESS_KEY** is the access key for the scoring data Azure storage account mentioned above. You may want to consider linking this variable to Azure KeyVault to avoid storing the access key in plain text. This variable is optional and only needed if you intend to use the batch scoring facility.
94-
9590

9691
## Provisioning resources using Azure Pipelines
9792

@@ -295,11 +290,10 @@ The pipeline stages are summarized below:
295290
- If run locally without the model version, the batch scoring pipeline will use the model's latest version.
296291
- Trigger the *ML Batch Scoring Pipeline* and waits for it to complete.
297292
- This is an **agentless** job. The CI pipeline can wait for ML pipeline completion for hours or even days without using agent resources.
298-
- Use the scoring input data supplied via the SCORING_DATASTORE_INPUT_* configuration variables.
293+
- Use the scoring input data supplied via the SCORING_DATASTORE_INPUT_* configuration variables, or uses the default datastore and sample data.
299294
- Once scoring is completed, the scores are made available in the same blob storage at the locations specified via the SCORING_DATASTORE_OUTPUT_* configuration variables.
300295

301-
**Note** In the event a scoring data store is not yet configured, you can still try out batch scoring by supplying a scoring input data file within the data folder. Do make sure to set the SCORING_DATASTORE_INPUT_FILENAME variable to the name of the file. This approach will cause the score output to be written to the ML workspace's default datastore.
302-
296+
To configure your own custom scoring data, see [Configure Custom Batch Scoring](custom_model.md#Configure-Custom-Batch-Scoring).
303297

304298
## Further Exploration
305299

0 commit comments

Comments
 (0)
Please sign in to comment.