Dis script wey go help prepare transcription data go download YouTube video transcript and arrange am make e fit work well with Semantic Search wey dey use OpenAI Embeddings and Functions sample.
Dem don test dis transcription data prep script for di latest Windows 11, macOS Ventura and Ubuntu 22.04 (and above).
Important
We dey advise say make you update di Azure CLI to di latest version so e go fit work well with OpenAI. Check Documentation
- Create resource group
Note
For dis instruction, we dey use resource group wey dem name "semantic-video-search" for East US. You fit change di name of di resource group, but if you wan change di location for di resources, make sure say you check di model availability table.
az group create --name semantic-video-search --location eastus- Create Azure OpenAI Service resource.
az cognitiveservices account create --name semantic-video-openai --resource-group semantic-video-search \
--location eastus --kind OpenAI --sku s0- Collect di endpoint and keys wey you go use for dis application.
az cognitiveservices account show --name semantic-video-openai \
--resource-group semantic-video-search | jq -r .properties.endpoint
az cognitiveservices account keys list --name semantic-video-openai \
--resource-group semantic-video-search | jq -r .key1- Deploy di models wey dey below:
text-embedding-ada-002version2or higher, wey dem nametext-embedding-ada-002gpt-35-turboversion0613or higher, wey dem namegpt-35-turbo
az cognitiveservices account deployment create \
--name semantic-video-openai \
--resource-group semantic-video-search \
--deployment-name text-embedding-ada-002 \
--model-name text-embedding-ada-002 \
--model-version "2" \
--model-format OpenAI \
--scale-settings-scale-type "Standard"
az cognitiveservices account deployment create \
--name semantic-video-openai \
--resource-group semantic-video-search \
--deployment-name gpt-35-turbo \
--model-name gpt-35-turbo \
--model-version "0613" \
--model-format OpenAI \
--sku-capacity 100 \
--sku-name "Standard"- Python 3.9 or higher
Di environment variables wey dey below na wetin you need to run di YouTube transcription data prep scripts.
E good make you add di variables to your user environment variables.
Windows Start > Edit the system environment variables > Environment Variables > User variables for [USER] > New.
AZURE_OPENAI_API_KEY \<your Azure OpenAI Service API key>
AZURE_OPENAI_ENDPOINT \<your Azure OpenAI Service endpoint>
AZURE_OPENAI_MODEL_DEPLOYMENT_NAME \<your Azure OpenAI Service model deployment name>
GOOGLE_DEVELOPER_API_KEY = \<your Google developer API key>
E good make you add di exports wey dey below to your ~/.bashrc or ~/.zshrc file.
export AZURE_OPENAI_API_KEY=<your Azure OpenAI Service API key>
export AZURE_OPENAI_ENDPOINT=<your Azure OpenAI Service endpoint>
export AZURE_OPENAI_MODEL_DEPLOYMENT_NAME=<your Azure OpenAI Service model deployment name>
export GOOGLE_DEVELOPER_API_KEY=<your Google developer API key>-
Install git client if e never dey your system.
-
For
Terminalwindow, clone di sample go di repo folder wey you like.git clone https://github.com/gloveboxes/semanic-search-openai-embeddings-functions.git
-
Enter di
data_prepfolder.cd semanic-search-openai-embeddings-functions/src/data_prep -
Create Python virtual environment.
For Windows:
python -m venv .venvFor macOS and Linux:
python3 -m venv .venv
-
Activate di Python virtual environment.
For Windows:
.venv\Scripts\activate
For macOS and Linux:
source .venv/bin/activate -
Install di libraries wey you need.
For Windows:
pip install -r requirements.txtFor macOS and Linux:
pip3 install -r requirements.txt
.\transcripts_prepare.ps1./transcripts_prepare.shDisclaimer:
Dis dokyument don use AI translation service Co-op Translator do di translation. Even as we dey try make am accurate, abeg sabi say machine translation fit get mistake or no dey correct well. Di original dokyument wey dey for im native language na di main source wey you go fit trust. For important information, e good make professional human translator check am. We no go fit take blame for any misunderstanding or wrong interpretation wey fit happen because you use dis translation.