æ€çŽ¢ã¢ããªã±ãŒã·ã§ã³ã®ã¬ãã¹ã³ã§ã¯ãç¬èªã®ããŒã¿ãå€§èŠæš¡èšèªã¢ãã«ïŒLLMïŒã«çµ±åããæ¹æ³ãç°¡åã«åŠã³ãŸããããã®ã¬ãã¹ã³ã§ã¯ãLLMã¢ããªã±ãŒã·ã§ã³ã«ããŒã¿ãåºç€ãšããŠçµã¿èŸŒãæŠå¿µãããã»ã¹ã®ä»çµã¿ãåã蟌ã¿ãããã¹ããå«ãããŒã¿ã®ä¿åæ¹æ³ã«ã€ããŠããã«æ·±æãããŸãã
åç»ã¯è¿æ¥å ¬éäºå®ã§ã
ãã®ã¬ãã¹ã³ã§æ±ãå 容ã¯ä»¥äžã®éãã§ãïŒ
-
RAGãšã¯äœãããããŠAIïŒäººå·¥ç¥èœïŒã§ãªã䜿ãããã®ãã®ç޹ä»
-
ãã¯ãã«ããŒã¿ããŒã¹ãšã¯äœããçè§£ããã¢ããªã±ãŒã·ã§ã³çšã«äœæããæ¹æ³
-
RAGãã¢ããªã±ãŒã·ã§ã³ã«çµ±åããå®è·µäŸ
ãã®ã¬ãã¹ã³ãçµããåŸã以äžãã§ããããã«ãªããŸãïŒ
-
ããŒã¿æ€çŽ¢ãšåŠçã«ãããRAGã®éèŠæ§ã説æã§ãã
-
RAGã¢ããªã±ãŒã·ã§ã³ãã»ããã¢ããããããŒã¿ãLLMã«åºç€ä»ããããšãã§ãã
-
LLMã¢ããªã±ãŒã·ã§ã³ã«ãããRAGãšãã¯ãã«ããŒã¿ããŒã¹ã®å¹æçãªçµ±åãã§ãã
ãã®ã¬ãã¹ã³ã§ã¯ãæè²ç³»ã¹ã¿ãŒãã¢ããã«èªåãã¡ã®ããŒãã远å ãããã£ããããããããŸããŸãªç§ç®ã«ã€ããŠããå€ãã®æ å ±ãåŸãããããã«ããŸããããŒããæŽ»çšããããšã§ãåŠç¿è ã¯ããå¹ççã«åŠã³ãç°ãªããããã¯ãçè§£ãããããªãã詊éšã®åŸ©ç¿ã楜ã«ãªããŸããã·ããªãªäœæã«ã¯ä»¥äžã䜿çšããŸãïŒ
-
Azure OpenAI:ãã£ãããããäœæã«äœ¿ãLLM -
AI for beginners' lesson on Neural Networks:LLMã®åºç€ãšãªãããŒã¿ -
Azure AI SearchãšAzure Cosmos DB:ããŒã¿ãä¿åãæ€çŽ¢ã€ã³ããã¯ã¹ãäœæãããã¯ãã«ããŒã¿ããŒã¹
ãŠãŒã¶ãŒã¯ããŒãããç·Žç¿åé¡ãäœæãããã埩ç¿çšã®ãã©ãã·ã¥ã«ãŒããäœã£ãããèŠçŽããŠç°¡æœãªæŠèŠãåŸãããšãã§ããŸãããŸãã¯RAGãšã¯äœãããã®ä»çµã¿ãèŠãŠãããŸãããã
LLMãæèŒãããã£ãããããã¯ããŠãŒã¶ãŒã®ããã³ããã«å¿ããŠå¿çãçæããŸãã察話åã§å¹ åºããããã¯ã«å¯Ÿå¿ããŸãããå¿çã¯æäŸãããã³ã³ããã¹ããšåºç€ãšãªãåŠç¿ããŒã¿ã«éå®ãããŸããäŸãã°ãGPT-4ã®ç¥èã«ãããªãã¯2021幎9æã§ããããã以éã®åºæ¥äºã¯ç¥ããŸããããŸããLLMã®åŠç¿ã«äœ¿ãããããŒã¿ã«ã¯ãå人ã®ã¡ã¢ãäŒæ¥ã®è£œåããã¥ã¢ã«ãªã©ã®æ©å¯æ å ±ã¯å«ãŸããŠããŸããã
äŸãã°ãããŒãããã¯ã€ãºãäœæãããã£ããããããå±éãããå Žåãç¥èããŒã¹ãžã®æ¥ç¶ãå¿ èŠã§ããããã§RAGã圹ç«ã¡ãŸããRAGã®åäœã¯ä»¥äžã®éãã§ãïŒ
-
ç¥èããŒã¹: æ€çŽ¢åã«ææžãåã蟌ã¿ãååŠçãè¡ããŸããéåžžã¯å€§ããªææžãå°ããªãã£ã³ã¯ã«åå²ããããã¹ãåã蟌ã¿ã«å€æããŠããŒã¿ããŒã¹ã«ä¿åããŸãã
-
ãŠãŒã¶ãŒã®è³ªå: ãŠãŒã¶ãŒã質åãããŸã
-
æ€çŽ¢: ãŠãŒã¶ãŒã®è³ªåã«å¯Ÿããåã蟌ã¿ã¢ãã«ãç¥èããŒã¹ããé¢é£æ å ±ãååŸããããã³ããã«çµã¿èŸŒã¿ãŸãã
-
æ¡åŒµçæ: LLMã¯ååŸããããŒã¿ãåºã«å¿çã匷åããŸããããã«ãããäºååŠç¿ããŒã¿ã ãã§ãªãã远å ãããã³ã³ããã¹ãã«åºã¥ãé¢é£æ å ±ãå«ãã å¿çãçæãããŸããLLMã¯ãŠãŒã¶ãŒã®è³ªåã«å¯ŸããŠåçãè¿ããŸãã
RAGã®ã¢ãŒããã¯ãã£ã¯ããšã³ã³ãŒããŒãšãã³ãŒããŒã®2ã€ã®éšåãããªããã©ã³ã¹ãã©ãŒããŒã¢ãã«ã§å®è£ ãããŸããäŸãã°ããŠãŒã¶ãŒã質åãããšãå ¥åããã¹ãã¯åèªã®æå³ãæãããã¯ãã«ã«ããšã³ã³ãŒããããããã®ãã¯ãã«ã¯ææžã€ã³ããã¯ã¹ã«ããã³ãŒããããããŠãŒã¶ãŒã®è³ªåã«åºã¥ãæ°ããããã¹ããçæãããŸããLLMã¯ãšã³ã³ãŒããŒã»ãã³ãŒããŒã¢ãã«ã®äž¡æ¹ã䜿ã£ãŠåºåãçæããŸãã
ææ¡ãããè«æRetrieval-Augmented Generation for Knowledge intensive NLP Tasksã«ãããšãRAGã®å®è£ ã«ã¯ä»¥äžã®2ã€ã®ã¢ãããŒãããããŸãïŒ
-
RAG-SequenceïŒååŸããææžã䜿ã£ãŠãŠãŒã¶ãŒã®è³ªåã«å¯Ÿããæé©ãªåçãäºæž¬ãã
-
RAG-TokenïŒææžã䜿ã£ãŠæ¬¡ã®ããŒã¯ã³ãçæãããããç¹°ãè¿ããŠãŠãŒã¶ãŒã®è³ªåã«çãã
-
æ å ±ã®è±å¯ã: ããã¹ãå¿çãææ°ãã€çŸç¶ã«å³ããŠããããšãä¿èšŒããŸããããã«ãããç¹å®ã®ãã¡ã€ã³ã«ç¹åããã¿ã¹ã¯ã®æ§èœãåäžããŸãã
-
èåœæ å ±ã®åæž: ç¥èããŒã¹ã®æ€èšŒå¯èœãªããŒã¿ãå©çšããŠããŠãŒã¶ãŒã®è³ªåã«å¯Ÿããã³ã³ããã¹ããæäŸããŸãã
-
ã³ã¹ãå¹ç: LLMã®ãã¡ã€ã³ãã¥ãŒãã³ã°ãããçµæžçã§ãã
ä»åã®ã¢ããªã±ãŒã·ã§ã³ã¯ãå人ã®ããŒã¿ãã€ãŸããAI For Beginnersãã®ãã¥ãŒã©ã«ãããã¯ãŒã¯ã¬ãã¹ã³ã«åºã¥ããŠããŸãã
ãã¯ãã«ããŒã¿ããŒã¹ã¯ãåŸæ¥ã®ããŒã¿ããŒã¹ãšã¯ç°ãªããåã蟌ã¿ãã¯ãã«ãä¿åã»ç®¡çã»æ€çŽ¢ããããã«ç¹åããããŒã¿ããŒã¹ã§ããææžã®æ°å€ç衚çŸãä¿åããŸããããŒã¿ãæ°å€ã®åã蟌ã¿ã«åè§£ããããšã§ãAIã·ã¹ãã ãããŒã¿ãçè§£ãããããªããŸãã
LLMã¯å ¥åã§ããããŒã¯ã³æ°ã«å¶éããããããåã蟌ã¿å šäœãäžåºŠã«æž¡ãããšã¯ã§ããŸãããããã§ãåã蟌ã¿ããã£ã³ã¯ã«åå²ãããŠãŒã¶ãŒã®è³ªåã«æãé¢é£ããåã蟌ã¿ãããã³ãããšäžç·ã«è¿ããŸãããã£ã³ã¯åã¯LLMã«æž¡ãããŒã¯ã³æ°ãæžãããã³ã¹ãåæžã«ãã€ãªãããŸãã
代衚çãªãã¯ãã«ããŒã¿ããŒã¹ã«ã¯ãAzure Cosmos DBãClarifyaiãPineconeãChromadbãScaNNãQdrantãDeepLakeãªã©ããããŸããAzure CLIã䜿ã£ãŠAzure Cosmos DBã¢ãã«ãäœæããã«ã¯ä»¥äžã®ã³ãã³ãã䜿ããŸãïŒ
az login
az group create -n <resource-group-name> -l <location>
az cosmosdb create -n <cosmos-db-name> -r <resource-group-name>
az cosmosdb list-keys -n <cosmos-db-name> -g <resource-group-name>ããŒã¿ãä¿åããåã«ãããã¹ãããã¯ãã«åã蟌ã¿ã«å€æããå¿ èŠããããŸãã倧ããªææžãé·ãããã¹ãã®å Žåãäºæ³ãããã¯ãšãªã«åºã¥ããŠãã£ã³ã¯åã§ããŸãããã£ã³ã¯åã¯æåäœã段èœåäœã§è¡ããŸãããã£ã³ã¯ã¯åšå²ã®åèªããæå³ãå°ãåºããããææžã¿ã€ãã«ããã£ã³ã¯ã®ååŸã®ããã¹ããªã©ã远å ã®ã³ã³ããã¹ããå ããããšãå¯èœã§ãããã£ã³ã¯åã®äŸã¯ä»¥äžã®éãã§ãïŒ
def split_text(text, max_length, min_length):
words = text.split()
chunks = []
current_chunk = []
for word in words:
current_chunk.append(word)
if len(' '.join(current_chunk)) < max_length and len(' '.join(current_chunk)) > min_length:
chunks.append(' '.join(current_chunk))
current_chunk = []
# If the last chunk didn't reach the minimum length, add it anyway
if current_chunk:
chunks.append(' '.join(current_chunk))
return chunksãã£ã³ã¯åããåŸã¯ãããŸããŸãªåã蟌ã¿ã¢ãã«ã䜿ã£ãŠããã¹ããåã蟌ã¿ãŸããå©çšå¯èœãªã¢ãã«ã«ã¯ãword2vecãOpenAIã®ada-002ãAzure Computer Visionãªã©ããããŸããéžæããã¢ãã«ã¯ã䜿çšèšèªããšã³ã³ãŒãããã³ã³ãã³ãã®çš®é¡ïŒããã¹ãïŒç»åïŒé³å£°ïŒãå ¥åãµã€ãºãåã蟌ã¿åºåã®é·ãã«ãã£ãŠç°ãªããŸãã
OpenAIã®text-embedding-ada-002ã¢ãã«ã§åã蟌ãã ããã¹ãã®äŸïŒ

ãŠãŒã¶ãŒã質åãããšããªããªãŒããŒã¯ã¯ãšãªãšã³ã³ãŒããŒã䜿ã£ãŠè³ªåããã¯ãã«ã«å€æããææžæ€çŽ¢ã€ã³ããã¯ã¹ããé¢é£ãããã¯ãã«ãæ¢ããŸããæ€çŽ¢ãçµãããšãå ¥åãã¯ãã«ãšææžãã¯ãã«ãããã¹ãã«å€æããLLMã«æž¡ããŸãã
æ€çŽ¢ã¯ãæ€çŽ¢æ¡ä»¶ãæºããææžãã€ã³ããã¯ã¹ããçŽ æ©ãèŠã€ããåŠçã§ãããªããªãŒããŒã®ç®çã¯ãLLMã«ã³ã³ããã¹ããæäŸããããŒã¿ã«åºã¥ããå¿çãå¯èœã«ããææžãååŸããããšã§ãã
ããŒã¿ããŒã¹å ã®æ€çŽ¢æ¹æ³ã«ã¯ä»¥äžããããŸãïŒ
-
ããŒã¯ãŒãæ€çŽ¢ - ããã¹ãæ€çŽ¢ã«äœ¿çš
-
ã»ãã³ãã£ãã¯æ€çŽ¢ - åèªã®æå³ã«åºã¥ãæ€çŽ¢
-
ãã¯ãã«æ€çŽ¢ - åã蟌ã¿ã¢ãã«ã䜿ã£ãŠææžããã¯ãã«è¡šçŸã«å€æãããŠãŒã¶ãŒã®è³ªåã«æãè¿ããã¯ãã«ãæã€ææžãæ€çŽ¢
-
ãã€ããªããæ€çŽ¢ - ããŒã¯ãŒãæ€çŽ¢ãšãã¯ãã«æ€çŽ¢ã®çµã¿åãã
æ€çŽ¢ã®èª²é¡ã¯ãããŒã¿ããŒã¹ã«é¡äŒŒããåçããªãå Žåãã·ã¹ãã ã¯æåã®æ å ±ãè¿ããŸãããé¢é£æ§ã®æå€§è·é¢ãèšå®ããããããŒã¯ãŒãæ€çŽ¢ãšãã¯ãã«æ€çŽ¢ãçµã¿åããããã€ããªããæ€çŽ¢ã䜿ãããšã§æ¹åã§ããŸãããã®ã¬ãã¹ã³ã§ã¯ãã€ããªããæ€çŽ¢ã䜿ãããã£ã³ã¯ãšåã蟌ã¿ãå«ãããŒã¿ãã¬ãŒã ã«ããŒã¿ãä¿åããŸãã
ãªããªãŒããŒã¯ç¥èããŒã¿ããŒã¹å ã§è¿ãåã蟌ã¿ãæ€çŽ¢ããŸããæãè¿ã飿¥ãã¯ãã«ã¯é¡äŒŒããããã¹ãã衚ããŸãããŠãŒã¶ãŒã®è³ªåã¯ãŸãåã蟌ã¿ã«å€æãããé¡äŒŒããåã蟌ã¿ãšãããã³ã°ãããŸããé¡äŒŒåºŠã®æž¬å®ã«ã¯ã2ã€ã®ãã¯ãã«éã®è§åºŠã«åºã¥ãã³ãµã€ã³é¡äŒŒåºŠãäžè¬çã«äœ¿ãããŸãã
ä»ã®é¡äŒŒåºŠæž¬å®æ¹æ³ãšããŠã¯ããã¯ãã«ã®ç«¯ç¹éã®çŽç·è·é¢ã枬ããŠãŒã¯ãªããè·é¢ãã察å¿ããèŠçŽ ã®ç©ã®åãæž¬ããããç©ããããŸãã
æ€çŽ¢ãè¡ãåã«ãç¥èããŒã¹ã®æ€çŽ¢ã€ã³ããã¯ã¹ãäœæããå¿ èŠããããŸããã€ã³ããã¯ã¹ã¯åã蟌ã¿ãä¿åããå€§èŠæš¡ãªããŒã¿ããŒã¹ã§ãæãé¡äŒŒãããã£ã³ã¯ãçŽ æ©ãååŸã§ããŸããããŒã«ã«ã§ã€ã³ããã¯ã¹ãäœæããã«ã¯ä»¥äžã䜿ããŸãïŒ
from sklearn.neighbors import NearestNeighbors
embeddings = flattened_df['embeddings'].to_list()
# Create the search index
nbrs = NearestNeighbors(n_neighbors=5, algorithm='ball_tree').fit(embeddings)
# To query the index, you can use the kneighbors method
distances, indices = nbrs.kneighbors(embeddings)ããŒã¿ããŒã¹ãã¯ãšãªããåŸãçµæãé¢é£åºŠã®é«ãé ã«äžŠã¹æ¿ããå¿ èŠããããŸããåã©ã³ãã³ã°LLMã¯æ©æ¢°åŠç¿ã䜿ããæ€çŽ¢çµæã®é¢é£æ§ãåäžãããæãé¢é£æ§ã®é«ããã®ããé ã«äžŠã¹ãŸããAzure AI Searchã§ã¯ãã»ãã³ãã£ãã¯åã©ã³ãã³ã°ãèªåã§è¡ãããŸããè¿åæ³ã䜿ã£ãåã©ã³ãã³ã°ã®äŸïŒ
# Find the most similar documents
distances, indices = nbrs.kneighbors([query_vector])
index = []
# Print the most similar documents
for i in range(3):
index = indices[0][i]
for index in indices[0]:
print(flattened_df['chunks'].iloc[index])
print(flattened_df['path'].iloc[index])
print(flattened_df['distances'].iloc[index])
else:
print(f"Index {index} not found in DataFrame")æåŸã«ãLLMãçµã¿èŸŒãã§ããŒã¿ã«åºã¥ããå¿çãåŸãããããã«ããŸããå®è£ äŸã¯ä»¥äžã®éãã§ãïŒ
user_input = "what is a perceptron?"
def chatbot(user_input):
# Convert the question to a query vector
query_vector = create_embeddings(user_input)
# Find the most similar documents
distances, indices = nbrs.kneighbors([query_vector])
# add documents to query to provide context
history = []
for index in indices[0]:
history.append(flattened_df['chunks'].iloc[index])
# combine the history and the user input
history.append(user_input)
# create a message object
messages=[
{"role": "system", "content": "You are an AI assistant that helps with AI questions."},
{"role": "user", "content": history[-1]}
]
# use chat completion to generate a response
response = openai.chat.completions.create(
model="gpt-4",
temperature=0.7,
max_tokens=800,
messages=messages
)
return response.choices[0].message
chatbot(user_input)-
å¿çã®è³ªïŒèªç¶ã§æµæ¢ã人éããããã©ãã
-
ããŒã¿ã®åºç€æ§ïŒå¿çãæäŸãããææžã«åºã¥ããŠãããã©ãã
-
é¢é£æ§ïŒå¿çã質åã«åèŽãé¢é£ããŠãããã©ãã
-
æµæ¢ãïŒææ³çã«æå³ãéã£ãŠãããã©ãã
颿°åŒã³åºãã§ã¢ããªãæ¹åã§ããããŸããŸãªãŠãŒã¹ã±ãŒã¹ããããŸãïŒ
-
質åå¿çïŒç€Ÿå ããŒã¿ãåºç€ã«ãããã£ããã§åŸæ¥å¡ã質åã§ããããã«ãã
-
ã¬ã³ã¡ã³ããŒã·ã§ã³ã·ã¹ãã ïŒæ ç»ãã¬ã¹ãã©ã³ãªã©ãæãé¡äŒŒãã䟡å€ããããã³ã°ããã·ã¹ãã ãäœæ
-
ãã£ããããããµãŒãã¹ïŒãã£ããå±¥æŽãä¿åãããŠãŒã¶ãŒããŒã¿ã«åºã¥ããŠäŒè©±ãããŒãœãã©ã€ãº
-
ãã¯ãã«åã蟌ã¿ã䜿ã£ãç»åæ€çŽ¢ïŒç»åèªèãç°åžžæ€ç¥ã«æçš
RAGã®åºæ¬çãªéšåãããŒã¿ã®è¿œå ããŠãŒã¶ãŒã®è³ªåãåºåã«ã€ããŠåŠã³ãŸãããRAGã®äœæãç°¡åã«ããããã«ãSemanti KernelãLangchainãAutogenãªã©ã®ãã¬ãŒã ã¯ãŒã¯ãå©çšã§ããŸãã
Retrieval Augmented Generation (RAG) ã®åŠç¿ãç¶ããããã«ã以äžãäœæããŠã¿ãŸãããïŒ
-
ã奜ããªãã¬ãŒã ã¯ãŒã¯ã䜿ã£ãŠã¢ããªã±ãŒã·ã§ã³ã®ããã³ããšã³ããæ§ç¯ãã
-
LangChainãŸãã¯Semantic Kernelã®ããããã®ãã¬ãŒã ã¯ãŒã¯ãå©çšããŠã¢ããªã±ãŒã·ã§ã³ãåæ§ç¯ãã
ã¬ãã¹ã³ä¿®äºããã§ãšãããããŸã ð
ãã®ã¬ãã¹ã³ãçµããããGenerative AI Learning collectionã§ããã«ãžã§ãã¬ãŒãã£ãAIã®ç¥èãæ·±ããŠãããŸãããïŒ
å
責äºé
ïŒ
æ¬æžé¡ã¯AI翻蚳ãµãŒãã¹ãCo-op Translatorãã䜿çšããŠç¿»èš³ãããŸãããæ£ç¢ºæ§ã®åäžã«åªããŠãããŸãããèªå翻蚳ã«ã¯èª€ããäžæ£ç¢ºãªéšåãå«ãŸããå¯èœæ§ããããŸããåæã®èšèªã«ãããªãªãžãã«ææžãæ£åŒãªæ
å ±æºãšã¿ãªãããã¹ãã§ããéèŠãªæ
å ±ã«ã€ããŠã¯ãå°éã®äººéã«ããç¿»èš³ãæšå¥šããŸããæ¬ç¿»èš³ã®å©çšã«ããçãããããªã誀解ã誀蚳ã«ã€ããŠããåœæ¹ã¯è²¬ä»»ãè² ããããŸãã


