| title | Alibaba cloud mysql integration |
|---|---|
| description | Integrate with the Alibaba cloud mysql vector store using LangChain Python. |
Alibaba Cloud MySQL is a fully managed relational database service that provides high availability, scalability, and security.
Alibaba Cloud MySQL provides deep integration for enterprise-level vector data processing. It natively supports storing and computing vector data of up to 16,383 dimensions. The service integrates mainstream vector operation functions and uses a highly optimized Hierarchical Navigable Small World (HNSW) algorithm to deliver efficient approximate nearest neighbor searches. This feature also supports creating indexes on full-dimension vector columns.
This guide provides a quick overview for getting started with the alibabacloud-mysql vector store. For a detailed listing of all alibabacloud-mysql vector store features, paramaters, and configurations, head to the langchain-alibabacloud-mysql.
To access the alibabacloud-mysql vector store, you'll need to create an Alibaba Cloud RDS for MySQL instance with minor version 8.0.36 or higher, open the vector feature, make it accessible, and install the langchain-alibabacloud-mysql integration package.
To connect to your Alibaba Cloud RDS MySQL instance, you'll need to set the following environment variables:
ALIBABACLOUD_MYSQL_HOST: Your RDS MySQL host addressALIBABACLOUD_MYSQL_PORT: MySQL port (default: 3306)ALIBABACLOUD_MYSQL_USER: MySQL usernameALIBABACLOUD_MYSQL_PASSWORD: MySQL passwordALIBABACLOUD_MYSQL_DATABASE: Database name
The LangChain alibabacloud-mysql integration lives in the langchain-alibabacloud-mysql package:
Now we can instantiate the vector store with your RDS MySQL connection informations:
import os
from langchain_alibabacloud_mysql import AlibabaCloudMySQL
from langchain_community.embeddings import DashScopeEmbeddings
# Initialize DashScope embeddings (Alibaba Cloud's embedding service)
embeddings = DashScopeEmbeddings(
model="text-embedding-v4",
dashscope_api_key=os.environ.get("DASHSCOPE_API_KEY"),
)
# Or you can use OpenAI embeddings
# embeddings = OpenAIEmbeddings()
# Initialize vector store
vector_store = AlibabaCloudMySQL(
host=os.environ.get("ALIBABACLOUD_MYSQL_HOST", "localhost"),
port=int(os.environ.get("ALIBABACLOUD_MYSQL_PORT", "3306")),
user=os.environ.get("ALIBABACLOUD_MYSQL_USER", "root"),
password=os.environ.get("ALIBABACLOUD_MYSQL_PASSWORD", ""),
database=os.environ.get("ALIBABACLOUD_MYSQL_DATABASE", "test"),
embedding=embeddings,
table_name="langchain_vectors",
distance_strategy="cosine", # or "euclidean"
hnsw_m=6, # HNSW index M parameter (3-200)
)from langchain_core.documents import Document
document_1 = Document(page_content="Alibaba", metadata={"source": "https://example.com"})
document_2 = Document(page_content="Cloud", metadata={"source": "https://example.com"})
document_3 = Document(page_content="RDS for MySQL", metadata={"source": "https://example.com"})
documents = [document_1, document_2, document_3]
vector_store.add_documents(documents=documents, ids=["1", "2", "3"])updated_document = Document(
page_content="Alibaba Cloud", metadata={"source": "https://another-example.com"}
)
vector_store.update_documents(document_id="1", document=updated_document)vector_store.delete(ids=["3"])Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.
Performing a simple similarity search can be done as follows:
results = vector_store.similarity_search(
query="mysql", k=1, filter={"source": "https://example.com"}
)
for doc in results:
print(f"* {doc.page_content} [{doc.metadata}]")If you want to execute a similarity search and receive the corresponding scores you can run:
results = vector_store.similarity_search_with_score(
query="mysql", k=1, filter={"source": "https://example.com"}
)
for doc, score in results:
print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")You can also transform the vector store into a retriever for easier usage in your chains.
retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})
retriever.invoke("alibaba")Alibaba Cloud MySQL vector store supports most standard vector store features:
| Feature | Supported |
|---|---|
| Delete by ID | β |
| Filtering | β |
| Search by Vector | β |
| Search with score | β |
| Async | β |
| Passes Standard Tests | β |
| Multi Tenancy | β |
| IDs in add Documents | β |
You can filter search results by metadata using dictionary-style filters:
# Search with metadata filter
results = vector_store.similarity_search(
query="technology",
k=5,
filter={"category": "tech", "year": {"$gte": 2023}}
)Supported filter operators:
$eq: Equal to$ne: Not equal to$gt: Greater than$gte: Greater than or equal to$lt: Less than$lte: Less than or equal to$in: In list$nin: Not in list$like: LIKE pattern matching
MMR search provides diverse results by balancing relevance and diversity:
results = vector_store.max_marginal_relevance_search(
query="artificial intelligence",
k=4,
fetch_k=20, # Number of candidates to consider
lambda_mult=0.5, # 0 = max diversity, 1 = max relevance
)Efficiently add multiple documents at once:
texts = ["Document 1", "Document 2", "Document 3"]
metadatas = [
{"source": "doc1.pdf"},
{"source": "doc2.pdf"},
{"source": "doc3.pdf"},
]
ids = vector_store.add_texts(texts, metadatas=metadatas)Retrieve specific documents by their IDs:
documents = vector_store.get_by_ids(["id1", "id2", "id3"])
for doc in documents:
print(f"{doc.page_content} - {doc.metadata}")Get the total number of vectors or clear all data:
# Count total vectors
count = vector_store.count()
print(f"Total vectors: {count}")
# Clear all vectors
vector_store.clear()AlibabaCloud MySQL vector store supports async operations for all major methods:
aadd_texts()- Add texts asynchronouslyaadd_documents()- Add documents asynchronouslyasimilarity_search()- Similarity search asynchronouslyasimilarity_search_with_score()- Similarity search with scores asynchronouslyamax_marginal_relevance_search()- MMR search asynchronouslyadelete()- Delete vectors asynchronouslyaget_by_ids()- Get documents by IDs asynchronouslyaclear()- Clear all vectors asynchronouslyacount()- Count vectors asynchronouslyaclose()- Close connection pool asynchronously
Retrieval-Augmented Generation (RAG) combines vector search with language model generation to provide contextual, accurate answers based on your documents.
Here's a complete example of building a RAG application with Alibaba Cloud MySQL:
import os
from langchain_alibabacloud_mysql import AlibabaCloudMySQL
from langchain_community.embeddings import DashScopeEmbeddings
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.chat_models.tongyi import ChatTongyi
from langchain_classic.chains import create_retrieval_chain
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
# Step 1: Initialize embeddings and vector store
embeddings = DashScopeEmbeddings(
model="text-embedding-v4",
dashscope_api_key=os.environ.get("DASHSCOPE_API_KEY"),
)
vector_store = AlibabaCloudMySQL(
host=os.environ.get("ALIBABACLOUD_MYSQL_HOST", "localhost"),
port=int(os.environ.get("ALIBABACLOUD_MYSQL_PORT", "3306")),
user=os.environ.get("ALIBABACLOUD_MYSQL_USER", "root"),
password=os.environ.get("ALIBABACLOUD_MYSQL_PASSWORD", ""),
database=os.environ.get("ALIBABACLOUD_MYSQL_DATABASE", "test"),
embedding=embeddings,
table_name="langchain_vectors_rag",
)
# Step 2: Load and split documents
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
)
splits = text_splitter.split_documents(docs)
# Step 3: Add documents to vector store
vector_store.add_documents(documents=splits)
# Step 4: Create retriever
retriever = vector_store.as_retriever(search_kwargs={"k": 3})
# Step 5: Create RAG chain
llm = ChatTongyi()
prompt = ChatPromptTemplate.from_template(
"""Answer the following question based only on the provided context:
Context: {context}
Question: {input}"""
)
document_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, document_chain)
# Step 6: Query
response = rag_chain.invoke({"input": "What is task decomposition?"})
print(response["answer"])You can also use the vector store as a retrieval tool in an agent:
from langchain.agents import create_agent
from langchain.tools import tool
@tool
def retrieve_context(query: str) -> str:
"""Retrieve information to help answer a query."""
retrieved_docs = vector_store.similarity_search(query, k=2)
return "\n\n".join(
f"Source: {doc.metadata}\nContent: {doc.page_content}"
for doc in retrieved_docs
)
tools = [retrieve_context]
llm = ChatTongyi()
agent = create_agent(
llm,
tools,
system_prompt="You have access to a tool that retrieves context. Use it to help answer user queries.",
)
response = agent.invoke({"messages": [{"role": "user", "content": "What is task decomposition?"}]})For more RAG guides and patterns, see:
For detailed RAG demo with Alibaba Cloud MySQL and more examples, see:
We will update the API reference soon, please refer to the langchain-alibabacloud-mysql for more details.