You can run the data ingestion locally in VS Code to contribute, adjust, test, or debug.
Ensure proper provisioning of cloud resources as per instructions in the Enterprise RAG repository before local deployment of the data ingestion function.
Once the cloud resources (such as Azure OpenAI, Azure Key Vault) have been provisioned as per the instructions mentioned earlier, follow these steps:
-
Clone this repository.
-
Install Azure Functions Core Tools.
-
Ensure that your VS Code has the following extensions installed:
-
Refer to the sections Roles to Assign section to grant the necessary roles and policies needed to run the function locally.
-
Open VS Code in the directory where you cloned the repository.
-
When opening it for the first time, create a virtual environment and point it to Python version 3.10 or 3.11.
Follow the examples illustrated in the images below.
-
Create a copy or rename the file
local.settings.json.template
tolocal.settings.json
and update it with your environment information. -
Before running the function locally, start the Azurite storage emulator. You can do this by double-clicking
[Azurite Blob Service]
, located in the bottom right corner of the status bar. -
✅ Done! Now you just need to hit
F5
(Start Debugging) to run the data ingestion function at http://localhost:7071/api/document-chunking.
Note
You can download this Postman Collection
to test your data ingestion endpoint.
Since we're now using managed identities, you need to assign the following roles to your user. Each role assignment can be done using the Azure CLI scripts provided below or you can assign the role via the Azure Portal.
-
Azure OpenAI Resource 'Cognitive Services OpenAI User' Role
For Linux users:
subscriptionId='your-subscription-id' resourceGroupName='your-resource-group-name' openAIAccountName='your-azure-openai-service-name' principalId='your-user-object-id-in-microsoft-entra-id' az role assignment create \ --role "Cognitive Services OpenAI User" \ --assignee $principalId \ --scope /subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.CognitiveServices/accounts/$openAIAccountName
For Windows users:
$subscriptionId="your-subscription-id" $resourceGroupName="your-resource-group-name" $openAIAccountName="your-azure-openai-service-name" $principalId="your-user-object-id-in-microsoft-entra-id" az role assignment create ` --role "Cognitive Services OpenAI User" ` --assignee $principalId ` --scope "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.CognitiveServices/accounts/$openAIAccountName"
-
Azure AI Search 'Search Service Contributor' and 'Search Index Data Contributor' Roles
For Linux users:
subscriptionId='your-subscription-id' resourceGroupName='your-resource-group-name' aiSearchResource='your-ai-search-resource-name' principalId='your-user-object-id-in-microsoft-entra-id' az role assignment create \ --role "Search Service Contributor" \ --assignee $principalId \ --scope /subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Search/searchServices/$aiSearchResource az role assignment create \ --role "Search Index Data Contributor" \ --assignee $principalId \ --scope /subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Search/searchServices/$aiSearchResource
For Windows users:
$subscriptionId="your-subscription-id" $resourceGroupName="your-resource-group-name" $aiSearchResource="your-ai-search-resource-name" $principalId="your-user-object-id-in-microsoft-entra-id" az role assignment create ` --role "Search Service Contributor" ` --assignee $principalId ` --scope "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Search/searchServices/$aiSearchResource" az role assignment create ` --role "Search Index Data Contributor" ` --assignee $principalId ` --scope "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Search/searchServices/$aiSearchResource"
-
Storage Blob Data Contributor
To read the content of the blob storage, assign the Storage Blob Data Contributor role to the identity used to run the program locally.
For Linux users:
subscriptionId='your-subscription-id' resourceGroupName='your-resource-group-name' storageAccountName='your-storage-account-name' principalId='your-user-object-id-in-microsoft-entra-id' az role assignment create \ --role "Storage Blob Data Contributor" \ --assignee $principalId \ --scope /subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Storage/storageAccounts/$storageAccountName
For Windows users:
$subscriptionId="your-subscription-id" $resourceGroupName="your-resource-group-name" $storageAccountName="your-storage-account-name" $principalId="your-user-object-id-in-microsoft-entra-id" az role assignment create ` --role "Storage Blob Data Contributor" ` --assignee $principalId ` --scope "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.Storage/storageAccounts/$storageAccountName"
-
Key Vault 'Secret Management' Roles
Assign the necessary permissions to manage secrets within your Key Vault.
For Linux users:
subscriptionId='your-subscription-id' resourceGroupName='your-resource-group-name' keyVaultName='your-key-vault-name' principalId='your-user-object-id-in-microsoft-entra-id' az keyvault set-policy \ --name $keyVaultName \ --object-id $principalId \ --secret-permissions get list set
For Windows users:
$subscriptionId="your-subscription-id" $resourceGroupName="your-resource-group-name" $keyVaultName="your-key-vault-name" $principalId="your-user-object-id-in-microsoft-entra-id" az keyvault set-policy ` --name $keyVaultName ` --object-id $principalId ` --secret-permissions get list set
-
Cognitive Services User
Assign the Cognitive Services User role to allow access to Cognitive Services resources.
For Linux users:
subscriptionId='your-subscription-id' resourceGroupName='your-resource-group-name' cognitiveServicesName='your-cognitive-services-account-name' principalId='your-user-object-id-in-microsoft-entra-id' az role assignment create \ --role "Cognitive Services User" \ --assignee $principalId \ --scope /subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.CognitiveServices/accounts/$cognitiveServicesName
For Windows users:
$subscriptionId="your-subscription-id" $resourceGroupName="your-resource-group-name" $cognitiveServicesName="your-cognitive-services-account-name" $principalId="your-user-object-id-in-microsoft-entra-id" az role assignment create ` --role "Cognitive Services User" ` --assignee $principalId ` --scope "/subscriptions/$subscriptionId/resourceGroups/$resourceGroupName/providers/Microsoft.CognitiveServices/accounts/$cognitiveServicesName"
Note
Replace the placeholder values (e.g., 'your-subscription-id'
, 'your-resource-group-name'
, etc.) with your actual Azure resource details and your user's Object ID from Microsoft Entra ID.
By following these scripts, you ensure that your user has the necessary permissions to interact with the required Azure services when running the data ingestion locally with VS Code.