æ€çŽ¢ã¢ããªã±ãŒã·ã§ã³ã®ã¬ãã¹ã³ã§ã¯ãç¬èªã®ããŒã¿ãå€§èŠæš¡èšèªã¢ãã« (LLM) ã«çµ±åããæ¹æ³ãç°¡åã«åŠã³ãŸããããã®ã¬ãã¹ã³ã§ã¯ãLLMã¢ããªã±ãŒã·ã§ã³ã«ããŒã¿ãæ ¹æ ã¥ããæŠå¿µãããã»ã¹ã®ã¡ã«ããºã ãåã蟌ã¿ãããã¹ããå«ãããŒã¿ãä¿åããæ¹æ³ã«ã€ããŠããã«æ·±ãæãäžããŸãã
ãããªã¯è¿æ¥å ¬éäºå®
ãã®ã¬ãã¹ã³ã§ã¯ã以äžã®å 容ãã«ããŒããŸãïŒ
-
RAGã®ç޹ä»ããããäœã§ããããªãAIïŒäººå·¥ç¥èœïŒã§äœ¿çšãããã®ãã
-
ãã¯ã¿ãŒããŒã¿ããŒã¹ãšã¯äœããçè§£ããã¢ããªã±ãŒã·ã§ã³çšã«äœæããã
-
RAGãã¢ããªã±ãŒã·ã§ã³ã«çµ±åããå®è·µçãªäŸã
ãã®ã¬ãã¹ã³ãå®äºãããšã次ã®ããšãã§ããããã«ãªããŸãïŒ
-
ããŒã¿ã®ååŸãšåŠçã«ãããRAGã®éèŠæ§ã説æããã
-
RAGã¢ããªã±ãŒã·ã§ã³ãã»ããã¢ããããããŒã¿ãLLMã«æ ¹æ ã¥ããã
-
LLMã¢ããªã±ãŒã·ã§ã³ã«ãããRAGãšãã¯ã¿ãŒããŒã¿ããŒã¹ã®å¹æçãªçµ±åã
ãã®ã¬ãã¹ã³ã§ã¯ãæè²ã¹ã¿ãŒãã¢ããã«ç¬èªã®ããŒãã远å ãããã£ããããããããŸããŸãªç§ç®ã«é¢ããæ å ±ãåŸãããããã«ããŸããããŒãã䜿çšããããšã§ãåŠç¿è ã¯ããè¯ãåŠç¿ããç°ãªããããã¯ãçè§£ãããããªãã詊éšã®åŸ©ç¿ã容æã«ãªããŸããã·ããªãªãäœæããããã«äœ¿çšããã®ã¯ïŒ
-
Azure OpenAI:ãã£ããããããäœæããããã«äœ¿çšããLLM -
AI for beginners' lesson on Neural Networks: ãããLLMã«æ ¹æ ã¥ããããŒã¿ã§ã -
Azure AI SearchãšAzure Cosmos DB:ãã¯ã¿ãŒããŒã¿ããŒã¹ã䜿çšããŠããŒã¿ãä¿åããæ€çŽ¢ã€ã³ããã¯ã¹ãäœæããŸã
ãŠãŒã¶ãŒã¯ããŒãããç·Žç¿ã¯ã€ãºãäœæãããã埩ç¿çšã®ãã©ãã·ã¥ã«ãŒããäœæãããããããç°¡æœãªæŠèŠã«ãŸãšãããããããšãã§ããŸãããŸããRAGãšã¯äœãããã®åäœãèŠãŠã¿ãŸãããïŒ
LLMãæèŒãããã£ãããããã¯ããŠãŒã¶ãŒã®ããã³ãããåŠçããŠå¿çãçæããŸããããã¯ã€ã³ã¿ã©ã¯ãã£ãã«èšèšãããŠãããå¹ åºããããã¯ã§ãŠãŒã¶ãŒãšå¯Ÿè©±ããŸãããã ãããã®å¿çã¯æäŸãããã³ã³ããã¹ããšåºç€ãšãªããã¬ãŒãã³ã°ããŒã¿ã«éå®ãããŸããããšãã°ãGPT-4ã®ç¥èã®ã«ãããªãã¯2021幎9æã§ããããã®æé以éã«çºçããã€ãã³ãã«ã€ããŠã®ç¥èã¯ãããŸãããããã«ãLLMãèšç·Žããããã«äœ¿çšãããããŒã¿ã«ã¯ãå人çãªããŒããäŒç€Ÿã®è£œåããã¥ã¢ã«ãªã©ã®æ©å¯æ å ±ã¯å«ãŸããŠããŸããã
ããšãã°ãããŒãããã¯ã€ãºãäœæãããã£ããããããå±éãããå Žåãç¥èããŒã¹ãžã®æ¥ç¶ãå¿ èŠã§ããããã§RAGã圹ç«ã¡ãŸããRAGã®åäœã¯ä»¥äžã®éãã§ãïŒ
-
ç¥èããŒã¹: æ€çŽ¢åã«ããããã®ããã¥ã¡ã³ããåã蟌ã¿ãååŠçããå¿ èŠããããŸããéåžžã倧ããªããã¥ã¡ã³ããå°ããªãã£ã³ã¯ã«åå²ããããã¹ãåã蟌ã¿ã«å€æããŠããŒã¿ããŒã¹ã«ä¿åããŸãã
-
ãŠãŒã¶ãŒã®è³ªå: ãŠãŒã¶ãŒã質åãããŸãã
-
æ€çŽ¢: ãŠãŒã¶ãŒã質åããããšãåã蟌ã¿ã¢ãã«ãç¥èããŒã¹ããé¢é£æ å ±ãæ€çŽ¢ããããã³ããã«çµã¿èŸŒãããã®è¿œå ã®ã³ã³ããã¹ããæäŸããŸãã
-
匷åçæ: LLMã¯æ€çŽ¢ãããããŒã¿ã«åºã¥ããŠå¿çã匷åããŸããããã«ãããçæãããå¿çã¯äºåã«èšç·ŽãããããŒã¿ã ãã§ãªãã远å ãããã³ã³ããã¹ãããã®é¢é£æ å ±ã«ãåºã¥ããŠããŸããæ€çŽ¢ãããããŒã¿ã¯LLMã®å¿çãè£åŒ·ããããã«äœ¿çšãããŸããLLMã¯ãã®åŸããŠãŒã¶ãŒã®è³ªåã«å¯Ÿããåçãè¿ããŸãã
RAGã®ã¢ãŒããã¯ãã£ã¯ããšã³ã³ãŒããŒãšãã³ãŒããŒã®2ã€ã®éšåãããªããã©ã³ã¹ãã©ãŒããŒã䜿çšããŠå®è£ ãããŸããããšãã°ããŠãŒã¶ãŒã質åããããšãå ¥åããã¹ãããã¯ãã«ã«ããšã³ã³ãŒãããããåèªã®æå³ãæãããã¯ãã«ãããã¥ã¡ã³ãã€ã³ããã¯ã¹ã«ããã³ãŒããããããŠãŒã¶ãŒã®è³ªåã«åºã¥ããŠæ°ããããã¹ããçæããŸããLLMã¯ãšã³ã³ãŒããŒãã³ãŒããŒã¢ãã«ã®äž¡æ¹ã䜿çšããŠåºåãçæããŸãã
ææ¡ãããè«æã«ãããšãRAGãå®è£ ããéã®2ã€ã®ã¢ãããŒãã¯æ¬¡ã®ãšããã§ãïŒç¥èéçŽåNLPïŒèªç¶èšèªåŠçãœãããŠã§ã¢ïŒã¿ã¹ã¯ã®ããã®æ€çŽ¢åŒ·åçæïŒ
-
RAG-Sequence æ€çŽ¢ãããããã¥ã¡ã³ãã䜿çšããŠããŠãŒã¶ãŒã®è³ªåã«å¯Ÿããæé©ãªåçãäºæž¬ããŸãã
-
RAG-Token ããã¥ã¡ã³ãã䜿çšããŠæ¬¡ã®ããŒã¯ã³ãçæããããããŠãŒã¶ãŒã®è³ªåã«çããããã«æ€çŽ¢ããŸãã
-
æ å ±ã®è±å¯ã: ããã¹ãå¿çãææ°ã§ããããšãä¿èšŒããŸãããããã£ãŠãå éšç¥èããŒã¹ã«ã¢ã¯ã»ã¹ããããšã§ãç¹å®ã®ãã¡ã€ã³ã¿ã¹ã¯ã®ããã©ãŒãã³ã¹ãåäžãããŸãã
-
æ€èšŒå¯èœãªããŒã¿ ãç¥èããŒã¹ã§å©çšããããšã§ããŠãŒã¶ãŒã®è³ªåã«ã³ã³ããã¹ããæäŸããæé ãæžãããŸãã
-
LLMã®åŸ®èª¿æŽã«æ¯ã¹ãŠããçµæžçã§ãããããã³ã¹ãå¹çãé«ãã§ãã
ç§ãã¡ã®ã¢ããªã±ãŒã·ã§ã³ã¯ãAIåå¿è åãã«ãªãã¥ã©ã ã®ãã¥ãŒã©ã«ãããã¯ãŒã¯ã¬ãã¹ã³ãšããå人ããŒã¿ã«åºã¥ããŠããŸãã
ãã¯ã¿ãŒããŒã¿ããŒã¹ã¯ãåŸæ¥ã®ããŒã¿ããŒã¹ãšã¯ç°ãªããåã蟌ã¿ãã¯ãã«ãä¿åã管çãæ€çŽ¢ããããã«èšèšãããå°éçãªããŒã¿ããŒã¹ã§ããããã¥ã¡ã³ãã®æ°å€è¡šçŸãä¿åããŸããããŒã¿ãæ°å€åã蟌ã¿ã«åè§£ããããšã§ãAIã·ã¹ãã ãããŒã¿ãçè§£ããåŠçãããããªããŸãã
åã蟌ã¿ããã¯ã¿ãŒããŒã¿ããŒã¹ã«ä¿åããçç±ã¯ãLLMãå ¥åãšããŠåãä»ããããŒã¯ã³ã®æ°ã«å¶éãããããã§ããåã蟌ã¿å šäœãLLMã«æž¡ãããšã¯ã§ããªãããããã£ã³ã¯ã«åå²ããå¿ èŠããããŸãããŠãŒã¶ãŒã質åããããšã質åã«æãè¿ãåã蟌ã¿ãããã³ãããšãšãã«è¿ãããŸãããã£ã³ã¯åã¯ãLLMãééããããŒã¯ã³ã®æ°ã«ãããã³ã¹ããåæžããŸãã
人æ°ã®ãããã¯ã¿ãŒããŒã¿ããŒã¹ã«ã¯ãAzure Cosmos DBãClarifyaiãPineconeãChromadbãScaNNãQdrantãDeepLakeããããŸããAzure CLIã䜿çšããŠAzure Cosmos DBã¢ãã«ã次ã®ã³ãã³ãã§äœæã§ããŸãïŒ
az login
az group create -n <resource-group-name> -l <location>
az cosmosdb create -n <cosmos-db-name> -r <resource-group-name>
az cosmosdb list-keys -n <cosmos-db-name> -g <resource-group-name>ããŒã¿ãä¿åããåã«ããããããŒã¿ããŒã¹ã«ä¿åããåã«ãã¯ãã«åã蟌ã¿ã«å€æããå¿ èŠããããŸãã倧ããªããã¥ã¡ã³ããé·ãããã¹ããæ±ãå Žåãäºæ³ãããã¯ãšãªã«åºã¥ããŠãã£ã³ã¯åããããšãã§ããŸãããã£ã³ã¯åã¯ãæã¬ãã«ã段èœã¬ãã«ã§è¡ãããšãã§ããŸãããã£ã³ã¯åã¯åšå²ã®åèªããæå³ãåŒãåºããããããã¥ã¡ã³ãã®ã¿ã€ãã«ã远å ãããããã£ã³ã¯ã®ååŸã«ããã¹ããå«ãããããããšã§ããã£ã³ã¯ã«ä»ã®ã³ã³ããã¹ãã远å ã§ããŸããããŒã¿ã次ã®ããã«ãã£ã³ã¯åã§ããŸãïŒ
def split_text(text, max_length, min_length):
words = text.split()
chunks = []
current_chunk = []
for word in words:
current_chunk.append(word)
if len(' '.join(current_chunk)) < max_length and len(' '.join(current_chunk)) > min_length:
chunks.append(' '.join(current_chunk))
current_chunk = []
# If the last chunk didn't reach the minimum length, add it anyway
if current_chunk:
chunks.append(' '.join(current_chunk))
return chunksãã£ã³ã¯åããåŸãããŸããŸãªåã蟌ã¿ã¢ãã«ã䜿çšããŠããã¹ããåã蟌ãããšãã§ããŸãã䜿çšã§ããã¢ãã«ã«ã¯ãword2vecãOpenAIã®ada-002ãAzure Computer Visionãªã©ããããŸãã䜿çšããã¢ãã«ã®éžæã¯ã䜿çšããèšèªããšã³ã³ãŒããããã³ã³ãã³ãã®çš®é¡ïŒããã¹ã/ç»å/é³å£°ïŒããšã³ã³ãŒãã§ããå ¥åã®ãµã€ãºãåã蟌ã¿åºåã®é·ãã«äŸåããŸãã
OpenAIã®text-embedding-ada-002ã¢ãã«ã䜿çšããåã蟌ã¿ããã¹ãã®äŸïŒ

ãŠãŒã¶ãŒã質åããããšãæ€çŽ¢è ã¯ãããã¯ãšãªãšã³ã³ãŒããŒã䜿çšããŠãã¯ãã«ã«å€æããå ¥åã«é¢é£ããããã¥ã¡ã³ãæ€çŽ¢ã€ã³ããã¯ã¹å ã®é¢é£ãã¯ãã«ãæ€çŽ¢ããŸããå®äºãããšãå ¥åãã¯ãã«ãšããã¥ã¡ã³ããã¯ãã«ã®äž¡æ¹ãããã¹ãã«å€æããLLMãéããŠæž¡ããŸãã
æ€çŽ¢ã¯ãã·ã¹ãã ãã€ã³ããã¯ã¹ããæ€çŽ¢åºæºãæºããããã¥ã¡ã³ããè¿ éã«èŠã€ããããšããéã«çºçããŸããæ€çŽ¢è ã®ç®æšã¯ãã³ã³ããã¹ããæäŸããããŒã¿ã«LLMãæ ¹æ ã¥ããããã«äœ¿çšãããããã¥ã¡ã³ããååŸããããšã§ãã
ããŒã¿ããŒã¹å ã§æ€çŽ¢ãå®è¡ããæ¹æ³ã¯ããã€ããããŸãïŒ
-
ããŒã¯ãŒãæ€çŽ¢ - ããã¹ãæ€çŽ¢ã«äœ¿çšãããŸã
-
ã»ãã³ãã£ãã¯æ€çŽ¢ - åèªã®æå³ã䜿çšããŸã
-
ãã¯ã¿ãŒæ€çŽ¢ - åã蟌ã¿ã¢ãã«ã䜿çšããŠããã¥ã¡ã³ããããã¹ããããã¯ãã«è¡šçŸã«å€æããŸããæ€çŽ¢ã¯ããŠãŒã¶ãŒã®è³ªåã«æãè¿ããã¯ãã«è¡šçŸãæã€ããã¥ã¡ã³ããã¯ãšãªããããšã§è¡ãããŸãã
-
ãã€ããªãã - ããŒã¯ãŒãæ€çŽ¢ãšãã¯ã¿ãŒæ€çŽ¢ã®äž¡æ¹ãçµã¿åããããã®ã
æ€çŽ¢ã«é¢ãã課é¡ã¯ãããŒã¿ããŒã¹ã«ã¯ãšãªã«é¡äŒŒããå¿çããªãå Žåã«çºçããŸãããã®å Žåãã·ã¹ãã ã¯å¯èœãªéãæè¯ã®æ å ±ãè¿ããŸãããé¢é£æ§ã®ããã®æå€§è·é¢ãèšå®ããããŸãã¯ããŒã¯ãŒãæ€çŽ¢ãšãã¯ã¿ãŒæ€çŽ¢ã®äž¡æ¹ãçµã¿åããããã€ããªããæ€çŽ¢ã䜿çšãããªã©ã®æŠè¡ã䜿çšã§ããŸãããã®ã¬ãã¹ã³ã§ã¯ããã¯ã¿ãŒæ€çŽ¢ãšããŒã¯ãŒãæ€çŽ¢ã®äž¡æ¹ãçµã¿åããããã€ããªããæ€çŽ¢ã䜿çšããŸããããŒã¿ããã£ã³ã¯ãšåã蟌ã¿ãå«ãåãæã€ããŒã¿ãã¬ãŒã ã«ä¿åããŸãã
æ€çŽ¢è ã¯ãç¥èããŒã¿ããŒã¹ãéããŠãè¿ãã«ããåã蟌ã¿ãæ€çŽ¢ããŸããæãè¿ã飿¥ã¯ãé¡äŒŒããããã¹ãã§ãããŠãŒã¶ãŒãã¯ãšãªãè¡ãå Žåãæåã«åã蟌ãŸããé¡äŒŒããåã蟌ã¿ãšäžèŽããŸããç°ãªããã¯ãã«ãã©ãã ãé¡äŒŒããŠããããèŠã€ããããã«äœ¿çšãããäžè¬çãªæž¬å®ã¯ã2ã€ã®ãã¯ãã«éã®è§åºŠã«åºã¥ãã³ãµã€ã³é¡äŒŒæ§ã§ãã
ä»ã®ä»£æ¿ææ®µãšããŠäœ¿çšã§ããé¡äŒŒæ§ã®æž¬å®æ¹æ³ã«ã¯ããã¯ãã«ã®ç«¯ç¹éã®çŽç·è·é¢ã§ãããŠãŒã¯ãªããè·é¢ãã2ã€ã®ãã¯ãã«ã®å¯Ÿå¿ããèŠçŽ ã®ç©ã®åèšã枬å®ãããããç©ããããŸãã
æ€çŽ¢ãè¡ãéãæ€çŽ¢ãå®è¡ããåã«ç¥èããŒã¹ã®æ€çŽ¢ã€ã³ããã¯ã¹ãæ§ç¯ããå¿ èŠããããŸããã€ã³ããã¯ã¹ã¯åã蟌ã¿ãä¿åããå€§èŠæš¡ãªããŒã¿ããŒã¹ã§ãæãé¡äŒŒãããã£ã³ã¯ãè¿ éã«æ€çŽ¢ã§ããŸããããŒã«ã«ã«ã€ã³ããã¯ã¹ãäœæããã«ã¯ã次ã®ããã«ããŸãïŒ
from sklearn.neighbors import NearestNeighbors
embeddings = flattened_df['embeddings'].to_list()
# Create the search index
nbrs = NearestNeighbors(n_neighbors=5, algorithm='ball_tree').fit(embeddings)
# To query the index, you can use the kneighbors method
distances, indices = nbrs.kneighbors(embeddings)ããŒã¿ããŒã¹ãã¯ãšãªããåŸãæãé¢é£æ§ã®ããçµæããäžŠã¹æ¿ããå¿ èŠããããããããŸãããåã©ã³ã¯ä»ãLLMã¯ãæ©æ¢°åŠç¿ãå©çšããŠæ€çŽ¢çµæã®é¢é£æ§ãåäžãããæãé¢é£æ§ã®é«ãé ã«äžŠã¹æ¿ããŸããAzure AI Searchã䜿çšãããšãåã©ã³ã¯ä»ãã¯ã»ãã³ãã£ãã¯åã©ã³ã«ãŒã䜿çšããŠèªåçã«è¡ãããŸããæè¿åã䜿çšããåã©ã³ã¯ä»ãã®åäœäŸïŒ
# Find the most similar documents
distances, indices = nbrs.kneighbors([query_vector])
index = []
# Print the most similar documents
for i in range(3):
index = indices[0][i]
for index in indices[0]:
print(flattened_df['chunks'].iloc[index])
print(flattened_df['path'].iloc[index])
print(flattened_df['distances'].iloc[index])
else:
print(f"Index {index} not found in DataFrame")æåŸã®ã¹ãããã¯ãLLMãçµã¿èŸŒãã§ãããŒã¿ã«æ ¹æ ãæãããå¿çãåŸãããšã§ãã以äžã®ããã«å®è£ ã§ããŸãïŒ
user_input = "what is a perceptron?"
def chatbot(user_input):
# Convert the question to a query vector
query_vector = create_embeddings(user_input)
# Find the most similar documents
distances, indices = nbrs.kneighbors([query_vector])
# add documents to query to provide context
history = []
for index in indices[0]:
history.append(flattened_df['chunks'].iloc[index])
# combine the history and the user input
history.append(user_input)
# create a message object
messages=[
{"role": "system", "content": "You are an AI assistant that helps with AI questions."},
{"role": "user", "content": history[-1]}
]
# use chat completion to generate a response
response = openai.chat.completions.create(
model="gpt-4",
temperature=0.7,
max_tokens=800,
messages=messages
)
return response.choices[0].message
chatbot(user_input)-
æäŸãããå¿çã®è³ªã確èªãããããèªç¶ã§æµæ¢ã§äººéããããã©ããã確èªãã
-
ããŒã¿ã®æ ¹æ æ§ïŒæäŸãããããã¥ã¡ã³ãããã®å¿çã§ãããã©ãããè©äŸ¡ãã
-
é¢é£æ§ïŒå¿çã質åã«äžèŽããé¢é£ããŠãããã©ãããè©äŸ¡ãã
-
æµæ¢ãïŒå¿çãææ³çã«æå³ããªããŠãããã©ãã
颿°åŒã³åºããã¢ããªãæ¹åã§ããããŸããŸãªãŠãŒã¹ã±ãŒã¹ããããŸãïŒ
-
質åå¿çïŒäŒç€Ÿã®ããŒã¿ããã£ããã«æ ¹æ ã¥ããåŸæ¥å¡ã質åã§ããããã«ããã
-
ã¬ã³ã¡ã³ããŒã·ã§ã³ã·ã¹ãã ïŒæ ç»ãã¬ã¹ãã©ã³ãªã©ãæãé¡äŒŒããå€ããããã³ã°ããã·ã¹ãã ãäœæããã
-
ãã£ããããããµãŒãã¹ïŒãã£ããå±¥æŽãä¿åãããŠãŒã¶ãŒããŒã¿ã«åºã¥ããŠäŒè©±ãããŒãœãã©ã€ãºããã
-
ãã¯ã¿ãŒåã蟌ã¿ã«åºã¥ãç»åæ€çŽ¢ãç»åèªèãç°åžžæ€åºã«åœ¹ç«ã¡ãŸãã
ããŒã¿ãã¢ããªã±ãŒã·ã§ã³ã«è¿œå ããããšããããŠãŒã¶ãŒã®ã¯ãšãªãšåºåãŸã§ãRAGã®åºæ¬çãªé åãã«ããŒããŸãããRAGã®äœæãç°¡çŽ åããããã«ãSemanti KernelãLangchainãAutogenãªã©ã®ãã¬ãŒã ã¯ãŒã¯ã䜿çšã§ããŸãã
æ€çŽ¢åŒ·åçæïŒRAGïŒã®åŠç¿ãç¶ããããã«ã以äžãæ§ç¯ã§ããŸãïŒ
-
ä»»æã®ãã¬ãŒã ã¯ãŒã¯ã䜿çšããŠã¢ããªã±ãŒã·ã§ã³ã®ããã³ããšã³ããæ§ç¯ãã
-
LangChainãŸãã¯Semantic Kernelã®ããããã®ãã¬ãŒã ã¯ãŒã¯ãå©çšããã¢ããªã±ãŒã·ã§ã³ãåäœæããã
ã¬ãã¹ã³ãå®äºããããšãããã§ãšãããããŸã ðã
ãã®ã¬ãã¹ã³ãå®äºããåŸãçæAIåŠç¿ã³ã¬ã¯ã·ã§ã³ããã§ãã¯ããŠãçæAIã®ç¥èãããã«é«ããŸãããïŒ
å
責äºé
:
ãã®ææžã¯ãAI翻蚳ãµãŒãã¹Co-op Translatorã䜿çšããŠç¿»èš³ãããŠããŸããæ£ç¢ºããæãããã«åªåããŠããŸãããèªå翻蚳ã«ã¯èª€ããäžæ£ç¢ºããå«ãŸããå¯èœæ§ãããããšããæ¿ç¥ãããã ãããå
ã®èšèªã®ææžãæš©åšããæ
å ±æºãšèŠãªãããã¹ãã§ããéèŠãªæ
å ±ã«ã€ããŠã¯ãå°éã®äººéã«ãã翻蚳ããå§ãããŸãããã®ç¿»èš³ã®äœ¿çšã«èµ·å ãã誀解ã誀蚳ã«ã€ããŠãåœç€Ÿã¯è²¬ä»»ãè² ããŸããã


