This document guides you through creating and testing prompts (few-shot classifier) using Prompt flow.
NOTE: These steps are not needed for the demo to work. The demo repository already includes the few-shot classifier in the code. Use this document to update the classifier, build a new one, or perform a batch test on them.
- An AML workspace has been created and Prompt flow is enabled on it.
There are 2 categories of work to be done:
- Setup dataset to be used for batch testing of the classifier. A sample test file is included in the data/ folder.
- Setup Prompt Flow to do few-shot classification using LLM.
- Upload the test file for bulk testing the flow to an Azure blob store. You could create a new container in the storage account that was created as part of the AML Workspace and upload the test file to that container. In this document, a new container named "test_data" has been created and the test file has been uploaded in the classifier folder there.
- Using AML Workspace, create a new data set of type "File (uri_file)". Name it "demo_data_set"
- For Data Source, select "From Azure Storage"
- For Storage Type, select "Azure Blob Storage" from blob store and click on "Create new datastore"
- Provide Datastore name, select Subscription ID of the storage account where the test file has been uploaded, select the Storage account and the blob container. For Authentication type, provide "Account key" paste account key of the Azure blob store where the test file has been uploaded.
- Select the newly created datastore and click Next
- For Storage Path, select the path where the test file has been uploaded. The demo file is uploaded in the "classifier" folder in the "test_data" container. Select the test file – “classification_test_no_history.tsv" and click Next
- Once all validations pass, click Create.
- Once creation succeeds, click on Explore tab, to see the details in the file. It should look something like below
- In you AML workspace, Click on Prompt Flow, click on Create and Create Standard Flow
- Delete the "hello_prompt" and "echo_my_prompt" steps as we will add a LLM step to do classification
- Click on LLM step to insert an LLM based step in the flow.
- Rename the step to "Classification" and configure the connection to point to an Azure OpenAI resource with gpt-4 model deployed, already.
- In the Prompt section, copy and paste the below prompt. More details on how to do prompt engineering, can be found here
system:
You are an intent classifier for Microsoft Surface product Sales and Marketing teams. The user will input a statement. You will focus on the main intent of the user statement, and you respond with only one of four values - '1', '2', '3' or '4'. You will not try to respond to the user's question, you will just classify the user statement0 based on the below classification rule:
If user statement is about past sales, prices, stores or stock of products/devices/laptops, you respond with 1
If user statement is on specifications of products/devices/laptops or marketing them, you respond with 2
If user statement is chit chat or about non-Microsoft products, you respond with 3
If user statement is asking for more details about a previous question, you respond with 4
Examples:
User: How much stock of this are we currently carrying?
Assistant: 1
User: Give me its specifications
Assistant: 2
User: How many MacBook Air do we have in stock?
Assistant: 3
User: Tell me more about it
Assistant: 4
User: Which Surface device is good for student's use:
Assistant: 1
User: What can you help me with:
Assistant: 3
User: Hello
Assistant: 3
user: {{question}}
- Update the value of the output. From the dropdown select ${Classification.output}
- Scroll down and click on "Validate and parse input" button and map user question field to Inputs.Text field
- Click on the "Run" button to run the classifier on the Input.Text field. By default this will be "Hello World!" But you can edit it and run again. You should see the output of the classifier in the Outputs section.
- Now you can perform a Bulk Test. Click on the "Bulk test" button
- Select the data set (created above), and click Next
- For evaluation method, select "Classification Accuracy Evaluation".
- Map the groundtruth and the prediction fields. Groundtruth to "data.classification" and prediction to "output.output_prompt"
- Click Next to review the mappings and then Submit the job.
- Once the job completes, review the Metrics and details to see the performance of the classifier NOTE: For now we are just calculating accuracy score for the entire batch test. This could be improved by selectig more balanced set of utterances across different classes for testing and performing for detailed analysis like:
- Per class precision/recall/F1 scores
- Confusion Matrix
- Aggregated weighted/macro/micro scores