Skip to content

Latest commit

 

History

History
167 lines (118 loc) · 5.32 KB

File metadata and controls

167 lines (118 loc) · 5.32 KB

Kuandaa data za uandishi wa maneno

Skripti za kuandaa data za uandishi wa maneno hupakua manukuu ya video za YouTube na kuziandaa kwa matumizi na mfano wa Semantic Search with OpenAI Embeddings and Functions.

Skripti za kuandaa data za uandishi wa maneno zimeshahakikiwa kwenye toleo la hivi karibuni la Windows 11, macOS Ventura na Ubuntu 22.04 (na zaidi).

Tengeneza rasilimali zinazohitajika za Azure OpenAI Service

Important

Tunapendekeza usasishaji wa Azure CLI hadi toleo la hivi karibuni ili kuhakikisha ulinganifu na OpenAI Angalia Documentation

  1. Tengeneza kundi la rasilimali

Note

Kwa maelekezo haya tunatumia kundi la rasilimali linaloitwa "semantic-video-search" katika East US. Unaweza kubadilisha jina la kundi la rasilimali, lakini unapobadilisha eneo la rasilimali, angalia model availability table.

az group create --name semantic-video-search --location eastus
  1. Tengeneza rasilimali ya Azure OpenAI Service.
az cognitiveservices account create --name semantic-video-openai --resource-group semantic-video-search \
    --location eastus --kind OpenAI --sku s0
  1. Pata endpoint na funguo kwa matumizi katika programu hii
az cognitiveservices account show --name semantic-video-openai \
   --resource-group  semantic-video-search | jq -r .properties.endpoint
az cognitiveservices account keys list --name semantic-video-openai \
   --resource-group semantic-video-search | jq -r .key1
  1. Sambaza mifano ifuatayo:
    • text-embedding-ada-002 toleo 2 au zaidi, liitwe text-embedding-ada-002
    • gpt-35-turbo toleo 0613 au zaidi, liitwe gpt-35-turbo
az cognitiveservices account deployment create \
    --name semantic-video-openai \
    --resource-group  semantic-video-search \
    --deployment-name text-embedding-ada-002 \
    --model-name text-embedding-ada-002 \
    --model-version "2"  \
    --model-format OpenAI \
    --scale-settings-scale-type "Standard"
az cognitiveservices account deployment create \
    --name semantic-video-openai \
    --resource-group  semantic-video-search \
    --deployment-name gpt-35-turbo \
    --model-name gpt-35-turbo \
    --model-version "0613"  \
    --model-format OpenAI \
    --sku-capacity 100 \
    --sku-name "Standard"

Programu zinazohitajika

Mabadiliko ya mazingira

Mabadiliko yafuatayo ya mazingira yanahitajika kuendesha skripti za kuandaa data za uandishi wa maneno za YouTube.

Kwenye Windows

Inashauriwa kuongeza mabadiliko haya kwenye mabadiliko ya mazingira ya user. Windows Start > Edit the system environment variables > Environment Variables > User variables kwa [USER] > New.

AZURE_OPENAI_API_KEY  \<your Azure OpenAI Service API key>
AZURE_OPENAI_ENDPOINT \<your Azure OpenAI Service endpoint>
AZURE_OPENAI_MODEL_DEPLOYMENT_NAME \<your Azure OpenAI Service model deployment name>
GOOGLE_DEVELOPER_API_KEY = \<your Google developer API key>

Kwenye Linux na macOS

Inashauriwa kuongeza amri zifuatazo za export kwenye faili yako ya ~/.bashrc au ~/.zshrc.

export AZURE_OPENAI_API_KEY=<your Azure OpenAI Service API key>
export AZURE_OPENAI_ENDPOINT=<your Azure OpenAI Service endpoint>
export AZURE_OPENAI_MODEL_DEPLOYMENT_NAME=<your Azure OpenAI Service model deployment name>
export GOOGLE_DEVELOPER_API_KEY=<your Google developer API key>

Sakinisha maktaba za Python zinazohitajika

  1. Sakinisha git client kama bado haijasakinishwa.

  2. Kutoka kwenye dirisha la Terminal, nakili mfano hadi folda yako unayopendelea ya repo.

    git clone https://github.com/gloveboxes/semanic-search-openai-embeddings-functions.git
  3. Nenda kwenye folda ya data_prep.

    cd semanic-search-openai-embeddings-functions/src/data_prep
  4. Tengeneza mazingira ya virtual ya Python.

    Kwenye Windows:

    python -m venv .venv

    Kwenye macOS na Linux:

    python3 -m venv .venv
  5. Washa mazingira ya virtual ya Python.

    Kwenye Windows:

    .venv\Scripts\activate

    Kwenye macOS na Linux:

    source .venv/bin/activate
  6. Sakinisha maktaba zinazohitajika.

    Kwenye Windows:

    pip install -r requirements.txt

    Kwenye macOS na Linux:

    pip3 install -r requirements.txt

Endesha skripti za kuandaa data za uandishi wa maneno za YouTube

Kwenye Windows

.\transcripts_prepare.ps1

Kwenye macOS na Linux

./transcripts_prepare.sh

Kiarifu cha Kutotegemea:
Hati hii imetafsiriwa kwa kutumia huduma ya tafsiri ya AI Co-op Translator. Ingawa tunajitahidi kwa usahihi, tafadhali fahamu kwamba tafsiri za kiotomatiki zinaweza kuwa na makosa au upungufu wa usahihi. Hati ya asili katika lugha yake ya asili inapaswa kuchukuliwa kama chanzo cha mamlaka. Kwa taarifa muhimu, tafsiri ya kitaalamu inayofanywa na binadamu inapendekezwa. Hatubebei dhamana kwa kutoelewana au tafsiri potofu zinazotokana na matumizi ya tafsiri hii.