æ€çŽ¢ã¢ããªã±ãŒã·ã§ã³ã®ã¬ãã¹ã³ã§ã¯ãç¬èªã®ããŒã¿ãå€§èŠæš¡èšèªã¢ãã«ïŒLLMïŒã«çµ±åããæ¹æ³ãç°¡åã«åŠã³ãŸããããã®ã¬ãã¹ã³ã§ã¯ãLLMã¢ããªã±ãŒã·ã§ã³ã«ãããããŒã¿ã®ã°ã©ãŠã³ãã£ã³ã°ã®æŠå¿µããã®ããã»ã¹ã®ä»çµã¿ããšã³ããã£ã³ã°ãšããã¹ãã®äž¡æ¹ãå«ãããŒã¿ã®ä¿åæ¹æ³ã«ã€ããŠããã«è©³ããæãäžããŸãã
ãããªè¿æ¥å ¬éäºå®
ãã®ã¬ãã¹ã³ã§ã¯ä»¥äžã®å å®¹ãæ±ããŸãïŒ
-
RAGã®ç޹ä»ããã®æå³ãšAIïŒäººå·¥ç¥èœïŒã§äœ¿çšãããçç±
-
ãã¯ãã«ããŒã¿ããŒã¹ãšã¯äœããçè§£ããã¢ããªã±ãŒã·ã§ã³çšã«äœæãã
-
RAGãã¢ããªã±ãŒã·ã§ã³ã«çµ±åããå®è·µäŸ
ãã®ã¬ãã¹ã³ãä¿®äºåŸã以äžãã§ããããã«ãªããŸãïŒ
-
ããŒã¿ã®æ€çŽ¢ããã³åŠçã«ãããRAGã®éèŠæ§ã説æãã
-
RAGã¢ããªã±ãŒã·ã§ã³ãã»ããã¢ããããããŒã¿ãLLMã«ã°ã©ãŠã³ããã
-
LLMã¢ããªã±ãŒã·ã§ã³ã«ãããRAGãšãã¯ãã«ããŒã¿ããŒã¹ã®å¹æçãªçµ±å
ãã®ã¬ãã¹ã³ã§ã¯ãèªåã®ããŒããæè²ç³»ã¹ã¿ãŒãã¢ããã«è¿œå ãããã£ããããããåæç§ã«ã€ããŠããå€ãã®æ å ±ãåŸãããããã«ããããšèããŠããŸããããŒããæŽ»çšããããšã§ãåŠç¿è ã¯åãããã¯ãããããåŠç¿ã»çè§£ã§ãã詊éšå匷ããããããªããŸããã·ããªãªäœæã«ã¯æ¬¡ã®ãã®ã䜿çšããŸãïŒ
-
Azure OpenAI:ãã£ãããããäœæã«å©çšããLLM -
AIåå¿è åãã®ãã¥ãŒã©ã«ãããã¯ãŒã¯ã¬ãã¹ã³:LLMã«ã°ã©ãŠã³ãããããŒã¿ -
Azure AI SearchãšAzure Cosmos DB:ãã¯ãã«ããŒã¿ããŒã¹ãšããŠããŒã¿ãä¿åãæ€çŽ¢ã€ã³ããã¯ã¹ãäœæ
ãŠãŒã¶ãŒã¯ããŒãããç·Žç¿åé¡ã埩ç¿ãã©ãã·ã¥ã«ãŒããäœæããèŠçŽããŠç°¡æœãªæŠèŠãåŸãããšãã§ããŸããå§ããã«ããããRAGãšã¯äœããããã³ãã®åäœã«ã€ããŠèŠãŠã¿ãŸãããïŒ
LLMãæèŒãããã£ãããããã¯ããŠãŒã¶ãŒããã³ãããåŠçããŠå¿çãçæããŸããã€ã³ã¿ã©ã¯ãã£ãã§æ§ã ãªããŒãã§ãŠãŒã¶ãŒãšãããšãã§ããŸãããã ããå¿çã¯æäŸãããæèãšåºç€ãã¬ãŒãã³ã°ããŒã¿ã«éå®ãããŸããäŸãã°ãGPT-4ã®ç¥èã«ãããªãã¯2021幎9æã§ããã以éã®åºæ¥äºã«ã¯å¯Ÿå¿ããŠããŸãããããã«ãLLMèšç·Žã«äœ¿ãããããŒã¿ã«ã¯å人çãªããŒããäŒç€Ÿã®ããã¥ã¢ã«ãªã©æ©å¯æ å ±ã¯å«ãŸããŠããŸããã
ããŒãããã¯ã€ãºãäœæãããã£ããããããå±éãããšããŸãããã®å Žåããã¬ããžããŒã¹ãžã®æ¥ç¶ãå¿ èŠã«ãªããŸããããã§RAGã圹ç«ã¡ãŸããRAGã®åäœã¯ä»¥äžã®éãã§ãïŒ
-
ãã¬ããžããŒã¹: æ€çŽ¢ã®åã«ææžãåã蟌ãŸããéåžžã¯å€§ããªææžãå°ããªãã£ã³ã¯ã«åå²ããããã¹ããšã³ããã£ã³ã°ã«å€æããŠããŒã¿ããŒã¹ã«ä¿åããŸãã
-
ãŠãŒã¶ãŒã¯ãšãª: ãŠãŒã¶ãŒã質åããã
-
æ€çŽ¢ïŒRetrievalïŒïŒ ã¯ãšãªãå ¥åããããšããšã³ããã£ã³ã°ã¢ãã«ããã¬ããžããŒã¹ããé¢é£æ å ±ãæ€çŽ¢ãããããããã³ããã«çµã¿èŸŒã¿ãŸãã
-
匷åçæïŒAugmented GenerationïŒïŒ LLMãååŸããŒã¿ãåºã«å¿çã匷åããŸããããã«ãããå¿çã¯äºåèšç·ŽããŒã¿ã ãã§ãªãã远å ãããæèã«åºã¥ãé¢é£æ å ±ã«åºã¥ããŠçæãããŸããLLMã¯ãã®åŸããŠãŒã¶ãŒã®è³ªåã«å¯ŸããŠåçãè¿ããŸãã
RAGã®ã¢ãŒããã¯ãã£ã¯ããšã³ã³ãŒããŒãšãã³ãŒããŒã®2ã€ã§æ§æããããã©ã³ã¹ãã©ãŒããŒã䜿çšããŠå®è£ ãããŸããããšãã°ããŠãŒã¶ãŒã質åãããšãå ¥åããã¹ãã¯æå³ãæãããã¯ãã«ã«ããšã³ã³ãŒããããããããææžã€ã³ããã¯ã¹ã«ããã³ãŒãããããŠãŠãŒã¶ãŒã¯ãšãªã«åºã¥ãæ°ããããã¹ããçæããŸããLLMã¯ãšã³ã³ãŒããŒãã³ãŒããŒã¢ãã«ã䜿çšããŠåºåãçæããŸãã
ææ¡ãããè«æãRetrieval-Augmented Generation for Knowledge intensive NLP Tasksãã«ãããšãRAGå®è£ ã«ã¯ä»¥äžã®2ã€ã®ã¢ãããŒãããããŸãïŒ
-
RAG-SequenceïŒååŸããææžãçšããŠãŠãŒã¶ãŒã¯ãšãªã«å¯ŸããŠæãé©åãªåçãäºæž¬ãã
-
RAG-TokenïŒææžã䜿ã£ãŠæ¬¡ã®ããŒã¯ã³ãçæãããã®åŸã«ææžãååŸããŠãŠãŒã¶ãŒã®è³ªåã«çãã
-
æ å ±ã®è±å¯ã: ããã¹ãå¿çãææ°ã®æ å ±ã«åºã¥ãããã¡ã€ã³åºæã¿ã¹ã¯ã®ããã©ãŒãã³ã¹ãåäžãããŸãã
-
æ€èšŒå¯èœãªããŒã¿ã䜿ã£ãæèæäŸã«ãã£ãŠèåœæ å ±ãæžãããŸãã
-
ã³ã¹ãå¹ç: LLMã®ãã¡ã€ã³ãã¥ãŒãã³ã°ãããçµæžçã§ãã
ä»åã®ã¢ããªã±ãŒã·ã§ã³ã¯å人ããŒã¿ãã€ãŸãAIåå¿è åããã¥ãŒã©ã«ãããã¯ãŒã¯ã¬ãã¹ã³ã«åºã¥ããŠããŸãã
ãã¯ãã«ããŒã¿ããŒã¹ã¯åŸæ¥ã®ããŒã¿ããŒã¹ãšã¯ç°ãªããåã蟌ã¿ãã¯ãã«ãä¿åã»ç®¡çãæ€çŽ¢ããããã«ç¹åããããŒã¿ããŒã¹ã§ããææžã®æ°å€è¡šçŸãä¿åããŸããããŒã¿ãæ°å€ãšã³ããã£ã³ã°ã«åè§£ããããšã§ãAIã·ã¹ãã ãçè§£ã»åŠçãããããªããŸãã
LLMã¯å ¥åã§ããããŒã¯ã³æ°ã«å¶éãããããããšã³ããã£ã³ã°å šäœãäžåºŠã«æž¡ããŸãããããã§ãã£ã³ã¯ã«åå²ãããŠãŒã¶ãŒã®è³ªåã«æãé¢é£ãããšã³ããã£ã³ã°ãããã³ãããšå ±ã«è¿ãä»çµã¿ãå¿ èŠã«ãªããŸãããã£ã³ã¯åå²ã¯ããŒã¯ã³æ°ã®åæžã«ããã³ã¹ãç¯çŽã«ã圹ç«ã¡ãŸãã
代衚çãªãã¯ãã«ããŒã¿ããŒã¹ã«ã¯Azure Cosmos DBãClarifyaiãPineconeãChromadbãScaNNãQdrantãDeepLakeããããŸããAzure CLIã§Azure Cosmos DBã¢ãã«ãäœæããã«ã¯ä»¥äžã®ã³ãã³ãã䜿çšããŸãïŒ
az login
az group create -n <resource-group-name> -l <location>
az cosmosdb create -n <cosmos-db-name> -r <resource-group-name>
az cosmosdb list-keys -n <cosmos-db-name> -g <resource-group-name>ããŒã¿ãä¿åããåã«ãããã¹ãããã¯ãã«ãšã³ããã£ã³ã°ã«å€æããŸããé·ãææžã倧ããªããã¹ãã¯ãæ³å®ãããã¯ãšãªã«åºã¥ããŠãã£ã³ã¯ã«åå²å¯èœã§ãããã£ã³ã¯ã¯æã段èœåäœã§è¡ãããŸããåšå²ã®èšèããæå³ãæšæž¬ããããããã£ã³ã¯ã«ææžã¿ã€ãã«ãååŸã®ããã¹ããªã©ä»å æ å ±ãå ããããšãå¯èœã§ããäŸãã°æ¬¡ã®ããã«ãã£ã³ã¯åå²ããŸãïŒ
def split_text(text, max_length, min_length):
words = text.split()
chunks = []
current_chunk = []
for word in words:
current_chunk.append(word)
if len(' '.join(current_chunk)) < max_length and len(' '.join(current_chunk)) > min_length:
chunks.append(' '.join(current_chunk))
current_chunk = []
# æåŸã®ãã£ã³ã¯ãæå°é·ã«éããŠããªããŠãããšã«ãã远å ããŠãã ãã
if current_chunk:
chunks.append(' '.join(current_chunk))
return chunksãã£ã³ã¯åããããã¹ãã¯æ§ã ãªãšã³ããã£ã³ã°ã¢ãã«ã䜿ã£ãŠãã¯ãã«åã§ããŸããå©çšã¢ãã«äŸã«ã¯word2vecãOpenAIã®ada-002ãAzure Computer Visionãªã©ããããèšèªãã³ã³ãã³ãçš®å¥ïŒããã¹ã/ç»å/é³å£°ïŒãå ¥åãµã€ãºããã³åºåãšã³ããã£ã³ã°é·ã«ããéžæããŸãã
OpenAIã®text-embedding-ada-002ã¢ãã«ã«ããäŸïŒ

ãŠãŒã¶ãŒã質åãããšãæ€çŽ¢åšã¯ã¯ãšãªãšã³ã³ãŒããŒã䜿ã£ãŠè³ªåããã¯ãã«åããææžæ€çŽ¢ã€ã³ããã¯ã¹ã§é¢é£ãã¯ãã«ãæ¢ããŸããèŠã€ãã£ãå ¥åãã¯ãã«ãšææžãã¯ãã«ã¯ããã¹ãã«å€æãããLLMã«æž¡ãããŸãã
æ€çŽ¢ã¯ã€ã³ããã¯ã¹ããæ¡ä»¶ã«åãææžãçŽ æ©ãæ¢ãããšã§ããæ€çŽ¢åšã®ç®çã¯ãæèãæäŸãLLMã«ããŒã¿ãã°ã©ãŠã³ãããããã®ææžãåŸãããšã§ãã
æ€çŽ¢æ¹æ³äŸïŒ
-
ããŒã¯ãŒãæ€çŽ¢ - ããã¹ãæ€çŽ¢ã«äœ¿çš
-
ãã¯ãã«æ€çŽ¢ - ææžãããã¹ããããšã³ããã£ã³ã°ã«å€æããåèªã®æå³ã«åºã¥ãæå³æ€çŽ¢ãå®çŸããŠãŒã¶ãŒã¯ãšãªã«æãè¿ããã¯ãã«ã®ææžãæ¢ãã
-
ãã€ããªãã - ããŒã¯ãŒãæ€çŽ¢ãšãã¯ãã«æ€çŽ¢ã®çµã¿åãã
ããŒã¿ããŒã¹ã«é¡äŒŒåçããªãå Žåãæè¯ã®æ å ±ãè¿ããŸãããé¢é£æ§ã®æå€§è·é¢èšå®ãããŒã¯ãŒããšãã¯ãã«æ€çŽ¢ãçµã¿åããããã€ããªããæ€çŽ¢ã§å¯Ÿå¿å¯èœã§ããæ¬ã¬ãã¹ã³ã§ã¯ãã€ããªããæ€çŽ¢ã䜿ãããã£ã³ã¯ãšãšã³ããã£ã³ã°ãåã«æã€ããŒã¿ãã¬ãŒã ã«ä¿åããŸãã
æ€çŽ¢åšã¯é¡äŒŒãããšã³ããã£ã³ã°ãããªãã¡è¿ã飿¥ãã¯ãã«ãæ¢ããŸãããŠãŒã¶ãŒã®ã¯ãšãªã¯ãã¯ãã«åããé¡äŒŒãã¯ãã«ãšãããã³ã°ãããŸãã代衚çãªé¡äŒŒåºŠæž¬å®ã«ã¯ã2ã€ã®ãã¯ãã«éã®è§åºŠã«åºã¥ãã³ãµã€ã³é¡äŒŒåºŠã䜿ããŸãã
ä»ã«ãããšãŠã¯ãªããè·é¢ïŒãã¯ãã«ã®ç«¯ç¹éã®çŽç·è·é¢ïŒããããç©ïŒå¯Ÿå¿ããèŠçŽ ç©ã®ç·åïŒãªã©ãé¡äŒŒåºŠæž¬å®ã«å©çšã§ããŸãã
æ€çŽ¢æã«ã¯ãã¬ããžããŒã¹ã®ããã«æ€çŽ¢ã€ã³ããã¯ã¹ãäœæããŸããã€ã³ããã¯ã¹ã¯ãšã³ããã£ã³ã°ãä¿åããå€§èŠæš¡ããŒã¿ããŒã¹äžã§ãé¡äŒŒãã£ã³ã¯ãçŽ æ©ãåŒãåºããŸããããŒã«ã«ã§ã®ã€ã³ããã¯ã¹äœæã¯ä»¥äžã§å¯èœã§ãïŒ
from sklearn.neighbors import NearestNeighbors
embeddings = flattened_df['embeddings'].to_list()
# æ€çŽ¢ã€ã³ããã¯ã¹ãäœæãã
nbrs = NearestNeighbors(n_neighbors=5, algorithm='ball_tree').fit(embeddings)
# ã€ã³ããã¯ã¹ãã¯ãšãªããã«ã¯ãkneighborsã¡ãœããã䜿çšã§ããŸã
distances, indices = nbrs.kneighbors(embeddings)æ€çŽ¢åŸã®çµæãé¢é£åºŠã§äžŠã¹æ¿ããããšããããŸããåã©ã³ãã³ã°LLMã¯æ©æ¢°åŠç¿ãäœ¿ãæ€çŽ¢çµæãéèŠé ã«äžŠã¹çŽããŸããAzure AI Searchã§ã¯ãæå³çåã©ã³ãã³ã°ãèªåã§è¡ãããŸããè¿åæ³ã§ã®åã©ã³ãã³ã°äŸïŒ
# æãé¡äŒŒããææžãèŠã€ãã
distances, indices = nbrs.kneighbors([query_vector])
index = []
# æãé¡äŒŒããææžãåºåãã
for i in range(3):
index = indices[0][i]
for index in indices[0]:
print(flattened_df['chunks'].iloc[index])
print(flattened_df['path'].iloc[index])
print(flattened_df['distances'].iloc[index])
else:
print(f"Index {index} not found in DataFrame")æåŸã®ã¹ãããã¯LLMãå ããŠãããŒã¿ã«åºã¥ããå¿çãåŸãããšã§ããå®è£ äŸïŒ
user_input = "what is a perceptron?"
def chatbot(user_input):
# 質åãã¯ãšãªãã¯ã¿ãŒã«å€æãã
query_vector = create_embeddings(user_input)
# æãé¡äŒŒããããã¥ã¡ã³ããèŠã€ãã
distances, indices = nbrs.kneighbors([query_vector])
# ã³ã³ããã¹ããæäŸããããã«ããã¥ã¡ã³ããã¯ãšãªã«è¿œå ãã
history = []
for index in indices[0]:
history.append(flattened_df['chunks'].iloc[index])
# å±¥æŽãšãŠãŒã¶ãŒå
¥åãçµåãã
history.append(user_input)
# ã¡ãã»ãŒãžãªããžã§ã¯ããäœæãã
messages=[
{"role": "system", "content": "You are an AI assistant that helps with AI questions."},
{"role": "user", "content": "\n\n".join(history) }
]
# ãã£ããè£å®ã䜿ã£ãŠå¿çãçæãã
response = openai.chat.completions.create(
model="gpt-4",
temperature=0.7,
max_tokens=800,
messages=messages
)
return response.choices[0].message
chatbot(user_input)-
å¿çã®è³ªïŒèªç¶ã§æµæ¢ãã€äººéãããã
-
ã°ã©ãŠã³ãã£ã³ã°ã®çšåºŠïŒå¿çãæäŸå ææžã«åºã¥ãã
-
é¢é£æ§: å¿çã質åå 容ã«åã£ãŠããã
-
æµæ¢ãïŒææ³çã«æå³ãéã£ãŠããã
ããŸããŸãªãŠãŒã¹ã±ãŒã¹ã§æ©èœã³ãŒã«ãã¢ããªæ¹åã«åœ¹ç«ã¡ãŸãïŒ
-
質åå¿çïŒç€Ÿå ããŒã¿ããã£ããã«ã°ã©ãŠã³ãã瀟å¡ã質åå¯èœã«ãã
-
ã¬ã³ã¡ã³ããŒã·ã§ã³ã·ã¹ãã ïŒæ ç»ãã¬ã¹ãã©ã³ãªã©é¡äŒŒåºŠã®é«ããã®ããããã³ã°
-
ãã£ããããããµãŒãã¹ïŒãã£ããå±¥æŽãä¿åããŠãŒã¶ãŒããšã«äŒè©±ããŒãœãã©ã€ãº
-
ç»åæ€çŽ¢ïŒãã¯ãã«ãšã³ããã£ã³ã°ã䜿ãç»åèªèãç°åžžæ€ç¥ã«æŽ»çš
RAGã®åºç€ãããŒã¿è¿œå ããŠãŒã¶ãŒã¯ãšãªãã¢ãŠããããã«ã€ããŠåŠã³ãŸãããRAGæ§ç¯ãç°¡çŽ åããããSemanti KernelãLangchainãAutogenãªã©ã®ãã¬ãŒã ã¯ãŒã¯ã䜿ãããšãã§ããŸãã
Retrieval Augmented Generation (RAG) ã®åŠç¿ãç¶ããããã«ä»¥äžãæ§ç¯ããŸãããïŒ
-
奜ããªãã¬ãŒã ã¯ãŒã¯ã䜿ã£ãŠã¢ããªã±ãŒã·ã§ã³ã®ããã³ããšã³ããäœæ
-
LangChain ãŸã㯠Semantic Kernel ãªã©ã®ãã¬ãŒã ã¯ãŒã¯ã䜿ããã¢ããªã±ãŒã·ã§ã³ãåæ§ç¯
ã¬ãã¹ã³å®äºããã§ãšãããããŸã ð
ã¬ãã¹ã³ä¿®äºåŸã¯ãGenerative AI Learning collection ãã芧ããã ãããžã§ãã¬ãŒãã£ãAIã®ç¥èãããã«æ·±ããŠãã ããïŒ
å
責äºé
ïŒ
æ¬æžé¡ã¯AI翻蚳ãµãŒãã¹Co-op Translatorã䜿çšããŠç¿»èš³ãããŠããŸããæ£ç¢ºãã«ã¯åªããŠãããŸãããèªå翻蚳ã«ã¯èª€ããäžæ£ç¢ºãªè¡šçŸãå«ãŸããå¯èœæ§ããããŸããæ£ç¢ºãªæ
å ±ã¯åæã®ãã€ãã£ãèšèªçããåç
§ãã ãããéèŠãªæ
å ±ã«ã€ããŠã¯ãå°éã®äººéã«ããç¿»èš³ãæšå¥šããããŸããæ¬ç¿»èš³ã®å©çšã«ãã誀解ã誀蚳ã«é¢ããŠãåœæ¹ã¯äžåã®è²¬ä»»ãè² ããããŸãã


