docs/src/oss/python/integrations/vectorstores/singlestore.mdx at c18dd36adb3914df0b79de7cfb47b66cc347512c · langchain-ai/docs

title	SingleStoreVectorStore integration
description	Integrate with the SingleStoreVectorStore using LangChain Python.

SingleStore is a distributed SQL database for transactional and analytical workloads. You can run it in the cloud or on premises.

SingleStore supports vector storage and similarity search alongside SQL. It includes vector functions such as dot_product and euclidean_distance. For table design, indexing, and query patterns, see working with vector data in the SingleStore documentation.

You can also combine vector search with full-text indexing based on Lucene and filter on document metadata. Depending on your workload, you can prefilter on text or vectors, or combine scores (for example, with a weighted sum).

Use the following sections to connect SingleStore to LangChain.

Class	Package	JS support
`SingleStoreVectorStore`	`langchain_singlestore`	✅

**For the langchain-community version `SingleStoreDB` (deprecated), see**

the v0.2 documentation.

Setup

To access SingleStore vector stores you'll need to install the langchain-singlestore integration package. pip install -qU "langchain-singlestore"

Initialization

To initialize SingleStoreVectorStore, you need an @[Embeddings] object and connection parameters for the SingleStore database.

Required parameters

embedding (Embeddings): A text embedding model.

Optional parameters

distance_strategy (DistanceStrategy): Strategy for calculating vector distances. Defaults to DOT_PRODUCT. Options:
- DOT_PRODUCT: Computes the scalar product of two vectors.
- EUCLIDEAN_DISTANCE: Computes the Euclidean distance between two vectors.
table_name (str): Name of the table. Defaults to embeddings.
content_field (str): Field for storing content. Defaults to content.
metadata_field (str): Field for storing metadata. Defaults to metadata.
vector_field (str): Field for storing vectors. Defaults to vector.
id_field (str): Field for storing IDs. Defaults to id.
use_vector_index (bool): Enables vector indexing (requires SingleStore 8.5+). Defaults to False.
vector_index_name (str): Name of the vector index. Ignored if use_vector_index is False.
vector_index_options (dict): Options for the vector index. Ignored if use_vector_index is False.
vector_size (int): Size of the vector. Required if use_vector_index is True.
use_full_text_search (bool): Enables full-text indexing on content. Defaults to False.

Connection pool parameters

pool_size (int): Number of active connections in the pool. Defaults to 5.
max_overflow (int): Maximum connections beyond pool_size. Defaults to 10.
timeout (float): Connection timeout in seconds. Defaults to 30.

Database connection parameters

host (str): Hostname, IP, or URL for the database.
user (str): Database username.
password (str): Database password.
port (int): Database port. Defaults to 3306.
database (str): Database name.

Additional options

pure_python (bool): Enables pure Python mode.
local_infile (bool): Allows local file uploads.
charset (str): Character set for string values.
ssl_key, ssl_cert, ssl_ca (str): Paths to SSL files.
ssl_disabled (bool): Disables SSL.
ssl_verify_cert (bool): Verifies server's certificate.
ssl_verify_identity (bool): Verifies server's identity.
autocommit (bool): Enables autocommits.
results_type (str): Structure of query results (e.g., tuples, dicts).

import os

from langchain_singlestore.vectorstores import SingleStoreVectorStore

os.environ["SINGLESTOREDB_URL"] = "root:pass@localhost:3306/db"

vector_store = SingleStoreVectorStore(embeddings=embeddings)

Manage vector store

The SingleStoreVectorStore assumes that a Document's ID is an integer. Below are examples of how to manage the vector store.

Add items to vector store

You can add documents to the vector store as follows:

pip install -qU langchain-core

from langchain_core.documents import Document

docs = [
    Document(
        page_content="""In the parched desert, a sudden rainstorm brought relief,
            as the droplets danced upon the thirsty earth, rejuvenating the landscape
            with the sweet scent of petrichor.""",
        metadata={"category": "rain"},
    ),
    Document(
        page_content="""Amidst the bustling cityscape, the rain fell relentlessly,
            creating a symphony of pitter-patter on the pavement, while umbrellas
            bloomed like colorful flowers in a sea of gray.""",
        metadata={"category": "rain"},
    ),
    Document(
        page_content="""High in the mountains, the rain transformed into a delicate
            mist, enveloping the peaks in a mystical veil, where each droplet seemed to
            whisper secrets to the ancient rocks below.""",
        metadata={"category": "rain"},
    ),
    Document(
        page_content="""Blanketing the countryside in a soft, pristine layer, the
            snowfall painted a serene tableau, muffling the world in a tranquil hush
            as delicate flakes settled upon the branches of trees like nature's own
            lacework.""",
        metadata={"category": "snow"},
    ),
    Document(
        page_content="""In the urban landscape, snow descended, transforming
            bustling streets into a winter wonderland, where the laughter of
            children echoed amidst the flurry of snowballs and the twinkle of
            holiday lights.""",
        metadata={"category": "snow"},
    ),
    Document(
        page_content="""Atop the rugged peaks, snow fell with an unyielding
            intensity, sculpting the landscape into a pristine alpine paradise,
            where the frozen crystals shimmered under the moonlight, casting a
            spell of enchantment over the wilderness below.""",
        metadata={"category": "snow"},
    ),
]


vector_store.add_documents(docs)

Update items in vector store

To update an existing document in the vector store, use the following code:

updated_document = Document(
    page_content="qux", metadata={"source": "https://another-example.com"}
)

vector_store.update_documents(document_id="1", document=updated_document)

Delete items from vector store

To delete documents from the vector store, use the following code:

vector_store.delete(ids=["3"])

Query vector store

Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.

Query directly

Performing a simple similarity search can be done as follows:

results = vector_store.similarity_search(query="trees in the snow", k=1)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

If you want to execute a similarity search and receive the corresponding scores you can run:

TODO: Edit and then run code cell to generate output

results = vector_store.similarity_search_with_score(query="trees in the snow", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

Metadata filtering

SingleStoreDB elevates search capabilities by enabling users to enhance and refine search results through prefiltering based on metadata fields. This functionality empowers developers and data analysts to fine-tune queries, ensuring that search results are precisely tailored to their requirements. By filtering search results using specific metadata attributes, users can narrow down the scope of their queries, focusing only on relevant data subsets.

SingleStoreVectorStore supports both simple and advanced metadata filtering using powerful query operators.

Simple metadata filtering

Use simple dictionary-style syntax for exact matches and backward compatibility:

# Filter by a single field
query = "trees branches"
docs = vector_store.similarity_search(
    query, filter={"category": "snow"}
)

# Filter by multiple fields (implicit AND)
docs = vector_store.similarity_search(
    query="landmarks",
    filter={"country": "France", "category": "museum"}
)

Advanced metadata filtering

Use advanced filters with operators like $eq, $gt, $in, $and, $or, and more for complex queries:

Comparison operators:

# Greater than, less than, and other comparisons
results = vector_store.similarity_search(
    query="old structures",
    k=10,
    filter={"year_built": {"$lt": 1900}}  # Built before 1900
)

# Other operators: $eq, $ne, $gt, $gte, $lte
results = vector_store.similarity_search(
    query="landmarks",
    filter={"year_built": {"$gte": 1800, "$lte": 1950}}
)

Collection operators:

# Check if value is in a list
results = vector_store.similarity_search(
    query="landmarks",
    k=10,
    filter={"country": {"$in": ["France", "UK"]}}
)

# Not in ($nin)
results = vector_store.similarity_search(
    query="museums",
    filter={"country": {"$nin": ["USA", "Canada"]}}
)

Existence check:

# Check if a field exists
results = vector_store.similarity_search(
    query="heritage sites",
    k=10,
    filter={"heritage_status": {"$exists": True}}
)

Logical operators:

# Combine multiple conditions with $and
results = vector_store.similarity_search(
    query="european landmarks",
    k=10,
    filter={
        "$and": [
            {"category": "landmark"},
            {"year_built": {"$gte": 1800}},
            {"country": {"$in": ["France", "UK"]}}
        ]
    }
)

# Use $or for alternative conditions
results = vector_store.similarity_search(
    query="cultural sites",
    filter={
        "$or": [
            {"category": "museum"},
            {"category": "landmark"}
        ]
    }
)

# Complex nested queries
results = vector_store.similarity_search(
    query="cultural sites",
    k=10,
    filter={
        "$or": [
            {
                "$and": [
                    {"category": "museum"},
                    {"country": "France"}
                ]
            },
            {
                "$and": [
                    {"category": "landmark"},
                    {"year_built": {"$lt": 1900}}
                ]
            }
        ]
    }
)

Vector index

Enhance your search efficiency with SingleStore DB version 8.5 or above by leveraging ANN vector indexes. By setting use_vector_index=True during vector store object creation, you can activate this feature. Additionally, if your vectors differ in dimensionality from the default OpenAI embedding size of 1536, ensure to specify the vector_size parameter accordingly.

Search strategies

SingleStoreDB presents a diverse range of search strategies, each meticulously crafted to cater to specific use cases and user preferences. The default VECTOR_ONLY strategy utilizes vector operations such as dot_product or euclidean_distance to calculate similarity scores directly between vectors, while TEXT_ONLY employs Lucene-based full-text search, particularly advantageous for text-centric applications. For users seeking a balanced approach, FILTER_BY_TEXT first refines results based on text similarity before conducting vector comparisons, whereas FILTER_BY_VECTOR prioritizes vector similarity, filtering results before assessing text similarity for optimal matches. Notably, both FILTER_BY_TEXT and FILTER_BY_VECTOR necessitate a full-text index for operation. Additionally, WEIGHTED_SUM emerges as a sophisticated strategy, calculating the final similarity score by weighing vector and text similarities, albeit exclusively utilizing dot_product distance calculations and also requiring a full-text index. These versatile strategies empower users to fine-tune searches according to their unique needs, facilitating efficient and precise data retrieval and analysis. Moreover, SingleStoreDB's hybrid approaches, exemplified by FILTER_BY_TEXT, FILTER_BY_VECTOR, and WEIGHTED_SUM strategies, seamlessly blend vector and text-based searches to maximize efficiency and accuracy, ensuring users can fully leverage the platform's capabilities for a wide range of applications.

from langchain_singlestore.vectorstores import DistanceStrategy

docsearch = SingleStoreVectorStore.from_documents(
    docs,
    embeddings,
    distance_strategy=DistanceStrategy.DOT_PRODUCT,  # Use dot product for similarity search
    use_vector_index=True,  # Use vector index for faster search
    use_full_text_search=True,  # Use full text index
)

vectorResults = docsearch.similarity_search(
    "rainstorm in parched desert, rain",
    k=1,
    search_strategy=SingleStoreVectorStore.SearchStrategy.VECTOR_ONLY,
    filter={"category": "rain"},
)
print(vectorResults[0].page_content)

textResults = docsearch.similarity_search(
    "rainstorm in parched desert, rain",
    k=1,
    search_strategy=SingleStoreVectorStore.SearchStrategy.TEXT_ONLY,
)
print(textResults[0].page_content)

filteredByTextResults = docsearch.similarity_search(
    "rainstorm in parched desert, rain",
    k=1,
    search_strategy=SingleStoreVectorStore.SearchStrategy.FILTER_BY_TEXT,
    filter_threshold=0.1,
)
print(filteredByTextResults[0].page_content)

filteredByVectorResults = docsearch.similarity_search(
    "rainstorm in parched desert, rain",
    k=1,
    search_strategy=SingleStoreVectorStore.SearchStrategy.FILTER_BY_VECTOR,
    filter_threshold=0.1,
)
print(filteredByVectorResults[0].page_content)

weightedSumResults = docsearch.similarity_search(
    "rainstorm in parched desert, rain",
    k=1,
    search_strategy=SingleStoreVectorStore.SearchStrategy.WEIGHTED_SUM,
    text_weight=0.2,
    vector_weight=0.8,
)
print(weightedSumResults[0].page_content)

Query by turning into retriever

You can also transform the vector store into a retriever for easier usage in your chains.

retriever = vector_store.as_retriever(search_kwargs={"k": 1})
retriever.invoke("trees in the snow")

Multi-modal example: Leveraging CLIP and OpenClip embeddings

In the realm of multi-modal data analysis, the integration of diverse information types like images and text has become increasingly crucial. One powerful tool facilitating such integration is CLIP, a cutting-edge model capable of embedding both images and text into a shared semantic space. By doing so, CLIP enables the retrieval of relevant content across different modalities through similarity search.

To illustrate, let's consider an application scenario where we aim to effectively analyze multi-modal data. In this example, we harness the capabilities of OpenClip multimodal embeddings, which leverage CLIP's framework. With OpenClip, we can seamlessly embed textual descriptions alongside corresponding images, enabling comprehensive analysis and retrieval tasks. Whether it's identifying visually similar images based on textual queries or finding relevant text passages associated with specific visual content, OpenClip empowers users to explore and extract insights from multi-modal data with remarkable efficiency and accuracy.

pip install -U langchain openai lanchain-singlestore langchain-experimental

import os

from langchain_experimental.open_clip import OpenCLIPEmbeddings
from langchain_singlestore.vectorstores import SingleStoreVectorStore

os.environ["SINGLESTOREDB_URL"] = "root:pass@localhost:3306/db"

TEST_IMAGES_DIR = "../../modules/images"

docsearch = SingleStoreVectorStore(OpenCLIPEmbeddings())

image_uris = sorted(
    [
        os.path.join(TEST_IMAGES_DIR, image_name)
        for image_name in os.listdir(TEST_IMAGES_DIR)
        if image_name.endswith(".jpg")
    ]
)

# Add images
docsearch.add_images(uris=image_uris)

Usage for retrieval-augmented generation

For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:

API reference

For detailed documentation of all SingleStore Document Loader features and configurations head to the github page: https://github.com/singlestore-labs/langchain-singlestore/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup

Initialization

Required parameters

Optional parameters

Connection pool parameters

Database connection parameters

Additional options

Manage vector store

Add items to vector store

Update items in vector store

Delete items from vector store

Query vector store

Query directly

Metadata filtering

Simple metadata filtering

Advanced metadata filtering

Vector index

Search strategies

Query by turning into retriever

Multi-modal example: Leveraging CLIP and OpenClip embeddings

Usage for retrieval-augmented generation

API reference

FilesExpand file tree

singlestore.mdx

Latest commit

History

singlestore.mdx

File metadata and controls

Setup

Initialization

Required parameters

Optional parameters

Connection pool parameters

Database connection parameters

Additional options

Manage vector store

Add items to vector store

Update items in vector store

Delete items from vector store

Query vector store

Query directly

Metadata filtering

Simple metadata filtering

Advanced metadata filtering

Vector index

Search strategies

Query by turning into retriever

Multi-modal example: Leveraging CLIP and OpenClip embeddings

Usage for retrieval-augmented generation

API reference