Skip to content

Commit d312012

Browse files
author
Luis Cabrera
committed
Adding workshop
1 parent 9a59abe commit d312012

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

80 files changed

+3753
-0
lines changed

workshops/LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2019 cynotebo
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

workshops/Module 0.md

+38
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Pre-Requisites for Knowledge Mining Workshop
2+
3+
Please make sure you fulfill the following pre-requisites before starting the workshop.
4+
5+
1. Have your own Azure account
6+
1. Be familiar [Azure Portal](https://portal.azure.com)
7+
1. Make sure you can create Azure resources in your subscription (including paid resources).
8+
9+
*Note, if your organizations policy prohibits you from creating resources in the subscription, you can use a [free subscription](https://signup.azure.com) for the purposes of this lab.*
10+
11+
4. Create a resource group for this workshop where you will add each of the resources you will create in the next steps.
12+
4. **Create** an [Azure Storage Account](https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account?tabs=azure-portal).
13+
Select Performance: *Standard* tier, not Premium
14+
Select Account kind: *StorageV2 (general purpose v2)*
15+
4. **Install** [Azure Storage Explorer](https://azure.microsoft.com/en-us/features/storage-explorer/)
16+
4. **Copy** the storage container that holds clinical trials from a read only location to your Storage Account.
17+
1. **Open** Azure Storage Explorer and select *Managed Accounts*, *Add and Account...*, *Use a shared access signature (SAS) URI*. The *Display name* will autofill. Select *Next*
18+
1. **Add** the following *URI*, then select *Next*, then select *Connect*
19+
```
20+
https://kmworkshop.blob.core.windows.net/clinical-trials-small?st=2019-09-13T22%3A58%3A18Z&se=2020-09-14T22%3A58%3A00Z&sp=rl&sv=2018-03-28&sr=c&sig=M7MPfuxZvVvBkf0Jgg%2BvKWyB49RFYlGNhQ4%2F1nIJ9DU%3D
21+
```
22+
3. **Select** *Toggle Explorer* to view the Explorer. Right click on the *clinical-trials-small* Blob Container that you just connected to and select *Copy Blob Container*
23+
![](images/copyblobcontainer.png)
24+
3. **Find** your Storage Account in the Explorer. Right click on its *Blob Containers* and select *Paste Blob Container*.
25+
![](images/pasteblobcontainer.png)
26+
1. Confirm that the container copied successfully by checking the Activities at the bottom of the Azure Storage Explorer.
27+
4. **Create** an [Azure Search](https://docs.microsoft.com/en-us/azure/search/search-create-service-portal) resource. (A Free Tier should be sufficient for this workshop).
28+
[Learn more](https://docs.microsoft.com/en-us/azure/search/search-sku-tier)
29+
30+
4. **Create** a [Cognitive Services resource](https://docs.microsoft.com/en-us/azure/cognitive-services/cognitive-services-apis-create-account?tabs=multiservice%2Cwindows).
31+
32+
*Note - You need to create the Cognitive Services resource in the same region as you Azure Search resource.*
33+
34+
10. **Install** [Visual Studio 2019](https://visualstudio.microsoft.com/). Make sure you can create ASP.Net websites with it.
35+
11. **Install** [Postman](https://www.getpostman.com/)
36+
11. **Install** [PowerBI desktop](https://powerbi.microsoft.com/en-us/desktop/).
37+
38+
### Next: [Module 1: Using Azure Portal to Create Your Index - No Code Required](Module 1.md)

workshops/Module 1.md

+186
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
2+
# Module 1: Using Azure Portal to Create Your Index - No Code Required
3+
4+
This module uses the Azure Portal to create your first Azure Search index without writing any code. Following these steps you will: ingest a set of files (clinical-trials); extract both structured and unstructured text from those files; index their content and learn how to query your new index. Finally, we'll use the Azure Portal to project enriched data into a Knowledge Store (new Preview capability), which we'll explore in greater detail in Module 6.
5+
6+
The instructions that follow assume that you have completed all of the pre-requisites found in the [ReadMe](./README.md) to this lab and have provisioned all of the necessary resources to your Azure subscription. If you have not already completed these steps, you will need to do so prior to moving forward.
7+
8+
## Using the Portal Import Data Flow:
9+
10+
11+
1. Navigate to your search service, and click the **Import Data** button. This will launch the Import Data Wizard which will guide you through the steps necessary to ingest your data, enrich it and create a search indexer.
12+
13+
![](images/importdata.png)
14+
15+
1. As part of the Import Data Wizard, in the **Connect to your data** tab, you can enter the information necessary to connect to your data source.
16+
17+
+ In the drop down for **Data Source**, choose *Azure Blob Storage*.
18+
19+
+ **Name** your data source *clinical-trials-small*.
20+
21+
+ Set **Data to Extract** to *Content and Metadata*
22+
23+
+ For the **Connection String** click *Choose an existing connection* and select your storage account. Select the *clinical-trials-small* container.
24+
25+
+ If you're not able to find the storage account you want to use by selecting *Choose an existing connection* you can always manually add the connection string. To get your connection string, view your storage account in the Azure Portal, select *Access keys* and copy the *Connection String*. Paste this as the *Connection string*. Then add the *Container name* which will be *clinical-trials-small*.
26+
27+
Your screen should now look similar to this:
28+
29+
![](images/chooseconnection.png)
30+
31+
+ Now click **Next** to apply cognitive skills to your data.
32+
33+
## Skillset
34+
35+
In Azure Search, we call extraction and enrichment steps cognitive skills, which are combined into a skillset referenced during indexing. In this exercise, you will be learning how to use the [built-in skills](https://docs.microsoft.com/en-us/azure/search/cognitive-search-predefined-skills) through the Azure Portal. In a later module, we will show you how to attach these skills programmatically and how to build your own [custom skills](https://docs.microsoft.com/en-us/azure/search/cognitive-search-custom-skill-interface).
36+
37+
In the next three steps, you will be working through the three drop-down arrows presented:
38+
39+
![](images/attachenrich.png)
40+
41+
42+
### Attach the Cognitive Services
43+
44+
This is the resource you created earlier as part of your intial lab set up and is used to power your pre-built AI models.
45+
46+
![](images/skillset.png)
47+
48+
### Add enrichments
49+
50+
Name your skillset: *clinical-trials-small*
51+
52+
+ Make sure to select the **OCR enrichment** to extract **merged_content** field.
53+
54+
+ Now we can apply an enrichment to the merged_content field to extract the locations.
55+
Change the name of the generated field to be **locations** with a lowercase 'l'. For consistency’s sake, let’s leave the field name locations.
56+
57+
+ Leave all of the other enrichment boxes blank at this time as we will add in additional skills later in the lab.
58+
59+
![](images/enrichments.png)
60+
61+
62+
### Save enrichments to a knowledge store (Preview)
63+
As you recall from the introductory session, the knowledge store is a new capability that we introduced into Public Preview in May. Using the Knowledge Store enables you to use your data in scenarios that do not lend themselves naturally to search. Once your data has been loaded into the Knowledge Store, you can do things like kick off RPA, run analytics or visualize in tools like PowerBI.
64+
65+
Projections are your mechanism for structuring data in a knowledge store. For example, through projections, you can choose whether output is saved as a single blob or a collection of related tables. An easy way to view knowledge store contents is through the built-in Storage Explorer for Azure storage.
66+
67+
The knowledge store supports two types of projections:
68+
69+
+ Tables: For data that is best represented as rows and columns, table projections allow you to define a schematized shape or projection in Table storage.
70+
71+
+ Objects: When you need a JSON representation of your data and enrichments, object projections are saved as blobs.
72+
73+
For this case, we are going to use Azure table projections
74+
75+
![](images/addks.png)
76+
77+
We're going to go ahead and create the Knowledge Store now through the Azure Portal and will come back to the visualizations in a later module.
78+
79+
1. Click choose an existing connection and select your storage account.
80+
2. Click on **+ Container** to create a new container called *clinical-trials-small-ks*.
81+
3. **Select** the container created in the above step.
82+
4. Under **Azure table projections**, make sure *Documents* and *Entities* have been selected.
83+
2. Click **Next: Customize the target index**.
84+
85+
86+
## Index Definition
87+
In this step, you are designing your Azure Search index. This is an important and powerful part of the index build process as you select the types of Analyzer(s) you want to use and make determinations on features such as which fields and data will be retrievable, filterable, sortable, and searchable.
88+
89+
1. Give your index a name like *clinical-trials-small*
90+
91+
2. Leave **Key** as the default option
92+
93+
3. Under **Suggester name** add sg and set **Search mode** to *analyzingInfixMatching*
94+
95+
4. In the index definition fields:
96+
+ Make sure all the fields are **retrievable**.
97+
+ Make sure that the locations field is **retrievable / facetable / filterable / searchable**.
98+
+ Set **English-Microsoft** as the *Analyzer* for all searchable fields since the content is in English.
99+
+ Select **Suggester** for trials, metadata_author, metadata_title and locations
100+
+ You can make layoutText not searchable/retrievable since we won’t need it for this workshop.
101+
102+
![](images/indexdef.png)
103+
104+
5. Click on **Next: Create an indexer**.
105+
106+
## Indexer Definition
107+
108+
1. Name the indexer *clinical-trials-small* .
109+
2. Set the **Schedule** to Once
110+
3. Click on the **Advanced options** drop down and note that that the index key is Base-64 encoded by default.
111+
112+
![](images/indexer.png)
113+
114+
4. Click on **Submit**. Then wait 2 or 3 minutes or so for the indexing to occur – then go check the status of your indexer on the portal.
115+
116+
![](images/chkstatus.png)
117+
118+
119+
![](images/chkstatus2.png)
120+
121+
## Searching the Content
122+
Now that the content has been indexed, we can use the portal to test some search queries. Open the **Search explorer** and enter a search query such as "MPS" to allow us to find all document that refer to the disease MPS, and press "Search". Try adjusting the query with different phrases and terms to get an idea of the content.
123+
124+
![](images/srchexplore.png)
125+
126+
Let's try a few additional queries:
127+
128+
Search for references to "Gaucher's" disease and do hit highlighting of the content where there is a reference to it:
129+
```
130+
gauchers&highlight=content
131+
```
132+
Notice as you scroll through the results that the English-Microsoft Analyzer was able to pick up variations to this phrase such as "Gaucher" and "Gaucher's" and highlights them using default <em> </em> tags.
133+
134+
Add a parameter &$count=true to determine that there are 8 documents that refer to "Gaucher's" disease:
135+
```
136+
gauchers&highlight=content&$count=true
137+
```
138+
139+
### Searching the Content using Postman
140+
141+
The search explorer is useful for performing queries like this, however most developers want to use external tools to start working against the service. For that reason, open Postman to perform the rest of the below search queries. To set up the queries we will set the Headers as:
142+
* api-key: [Enter Admin API Key from Azure Search portal]
143+
* Content-Type: application/json
144+
145+
You can retrieve the API key by pulling up your search service in the Azure Portal, selecting Keys, then copying one of the available admin keys.
146+
147+
When we configured the Indexing of the content, we asked for locations to be extracted from the content. Let's take a look at this by searching for morquio disease and limiting the results to only return the metadata_title, locations fields. Remember to update {name of your service} with the name of your search service.
148+
```
149+
GET https://{name of your service}.search.windows.net/indexes/clinical-trials-small/docs?api-version=2019-05-06&search=morquio&$select=metadata_title,locations
150+
```
151+
152+
Here's what the request will look like in Postman:
153+
154+
![](images/querywithselect.PNG)
155+
156+
Notice how the *locations* field is a Collection (or array of strings) that includes all the Location names extracted from the content.
157+
158+
Let's try to group all of the *locations* by using Faceting.
159+
```
160+
GET https://{name of your service}.search.windows.net/indexes/clinical-trials-small/docs?api-version=2019-05-06&search=morquio&$select=metadata_title,locations&facet=locations
161+
```
162+
We can see how the search results has added a list of the top locations and how often they are found in documents that talk about Morquio.
163+
164+
Next, let's filter the results to documents that refer to Morquio and have a Location of "Emory University"
165+
```
166+
GET https://{name of your service}.search.windows.net/indexes/clinical-trials-small/docs?api-version=2019-05-06&search=morquio&$select=metadata_title,locations&$filter=locations/any(location: location eq 'Emory University')
167+
```
168+
169+
As a final query, we will use the autocomplete capability to suggest terms that match what a user types. You have likely seen this in search boxes where users start typing and the system quickly suggests potential matches. Notice how this request is a POST as opposed to a GET.
170+
171+
```
172+
POST https://{name of your service}.search.windows.net/indexes/clinical-trials-small/docs/autocomplete?api-version=2019-05-06
173+
```
174+
Set the body of the request as "raw" and include:
175+
```json
176+
{
177+
"fuzzy": true,
178+
"search": "hea",
179+
"suggesterName": "sg",
180+
"autocompleteMode": "twoTerms"
181+
}
182+
```
183+
Notice how this request uses a suggesterName called "sg". You will recall that when you configured the index, you selected some columns to be used to power these autocomplete requests.
184+
185+
186+
### Next: [Module 2: Visualizing the Results with a Demo FrontEnd](Module&#32;2.md)

workshops/Module 2.md

+67
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Module 2: Visualizing the Results with a Demo FrontEnd
2+
Now that you've built your Search index, we'll take a moment to build and deploy a simple web-page that will allow you to visualize your initial search results. We'll come back to this view throughout the lab and take special note of how our search results change as we add additional features and capabilities.
3+
4+
To get started with this exercise, we will be using and getting familiar with the [Knowledge Mining solution accelerator](https://github.com/Azure-Samples/azure-search-knowledge-mining) to create our front-end experience. This accelerator was published to provide developers with all of the steps and tools required to build a working minimally viable knowledge mining solution. Take a few moments to note that it contains modules to: deploy the required Azure resources; build custom skills; and present the results in a simple, but elegant front-end. At the end of this lab, your results will look similar to this:
5+
6+
![](images/results.png)
7+
8+
## 1. Clone the repository
9+
```
10+
git clone https://github.com/Azure-Samples/azure-search-knowledge-mining.git
11+
```
12+
13+
## 2. Start the project
14+
15+
Open **CognitiveSearch.UI.csproj** (02-Web UI Template\CognitiveSearch.UI) in Visual Studio
16+
17+
## 3. Update appsettings.json
18+
19+
Update the following fields in the *appsettings.json* file to connect the web app to your storage account, search index, and app insights account:
20+
21+
```json
22+
"SearchServiceName": "Your Search Service Name",
23+
"SearchApiKey": "Your Search Service key",
24+
"SearchIndexName": "clinical-trials-small",
25+
"InstrumentationKey": "",
26+
"StorageAccountName": "Your storage Account Name",
27+
"StorageAccountKey": "Your Storage Account Key",
28+
"StorageContainerAddress": "Your Storage Container Address",
29+
"KeyField": "metadata_storage_path",
30+
"IsPathBase64Encoded": true,
31+
"GraphFacet": "diseases"
32+
```
33+
34+
### Notes
35+
1. **SearchServiceName** should be set to the name of the search service. (i.e. "myservice")
36+
1. **SearchApiKey** should be to the name of the search service. (i.e. "B8365AC95521089B7E3FA4CC98435")
37+
1. **SearchIndexName** should be set to the name of the index (i.e. "clinical-trials-small")
38+
1. **StorageAccountName** should be set to the name of the storage account (i.e. "mystorageaccount")
39+
1. **StorageContainerAddress** should be in the following format: *"https://*storageaccountname*.blob.core.windows.net/*containername*"*
40+
1. **InstrumentationKey** is an optional field. The instrumentation key connects the web app to Application Insights in order to populate the Power BI reports.
41+
1. **KeyField** should be set to the field specified as a key document Id in the index. (i.e. "metadata_storage_path")
42+
1. Sometimes metadata_storage_path is the key, and it gets base64 encoded. In that case set **IsPathBase64Encoded** to true.
43+
1. The **GraphFacet** is used for generating the relationship graph, set it to the name of the facet that you would like to use (i.e. "diseases"). Or leave blank if you won't use the node graph.
44+
45+
46+
###
47+
*Important:*
48+
While this tutorial is optimizing for efficiency of allowing you to see results, and investigate the code, please note that entering your credentials into code is not a good practice to follow. We recommend you use a service like [Azure Key Vault](https://docs.microsoft.com/en-us/azure/key-vault/key-vault-overview) to do this.
49+
50+
## 3. Set the **Startup Project**
51+
52+
![](images/setstart.png)
53+
54+
## 4. Run the project and see the results
55+
56+
![](images/results.png)
57+
58+
## 5. Inspect the code
59+
60+
Much of the UI is rendered dynamically by javascript. Some important files to know when making changes to the UI are:
61+
62+
1. **wwroot/js/results.js** - contains the code used to render search results on the UI
63+
64+
2. **wwroot/js/details.js** - contains the code for rending the detail view once a result is selected
65+
66+
### Next: [Module 3: Introduction to Azure Functions and Custom Skills](Module&#32;3.md)
67+

0 commit comments

Comments
 (0)