Skip to content

oceanbase/langchain-oceanbase

langchain-oceanbase

This package contains the LangChain integration with OceanBase.

OceanBase Database is a distributed relational database. It is developed entirely by Ant Group. The OceanBase Database is built on a common server cluster. Based on the Paxos protocol and its distributed structure, the OceanBase Database provides high availability and linear scalability.

OceanBase currently has the ability to store vectors. Users can easily perform the following operations with SQL:

  • Create a table containing vector type fields;
  • Create a vector index table based on the HNSW algorithm;
  • Perform vector approximate nearest neighbor queries;
  • ...

Features

  • Built-in Embedding: Built-in embedding function using all-MiniLM-L6-v2 model (384 dimensions) with no API keys required. Perfect for quick prototyping and local development.
    • No API Keys Required: Uses local ONNX models, no external API calls needed
    • Quick Start: Perfect for rapid prototyping and testing
    • LangChain Compatible: Fully compatible with LangChain's Embeddings interface
    • Batch Processing: Supports efficient batch embedding generation
    • Automatic Integration: Can be automatically used in OceanbaseVectorStore by setting embedding_function=None
    • Technical Specs: Model all-MiniLM-L6-v2, 384 dimensions, ONNX Runtime inference
  • Vector Storage: Store embeddings from any LangChain embedding model in OceanBase with automatic table creation and index management.
  • Similarity Search: Perform efficient similarity searches on vector data with multiple distance metrics (L2, cosine, inner product).
  • Hybrid Search: Combine vector search with sparse vector search and full-text search for improved results with configurable weights.
  • Maximal Marginal Relevance: Filter for diversity in search results to avoid redundant information.
  • Multiple Index Types: Support for HNSW, IVF, FLAT and other vector index types with automatic parameter optimization.
  • Sparse Embeddings: Native support for sparse vector embeddings with BM25-like functionality.
  • Advanced Filtering: Built-in support for metadata filtering and complex query conditions.
  • Async Support: Full support for async operations and high-concurrency scenarios.

Installation

pip install -U langchain-oceanbase

Requirements

  • Python >=3.11
  • langchain-core >=1.0.0
  • pyobvector >=0.2.0 (required for database client)
  • pyseekdb >=0.1.0 (optional, for built-in embedding functionality)

Tip: The current version supports langchain-core >=1.0.0

Platform Support

  • Linux: Full support (x86_64 and ARM64)
  • macOS/Windows: Supported - pyobvector works on all platforms

Built-in Embedding Dependencies

For built-in embedding functionality (no API keys required), pyseekdb is automatically installed as an optional dependency. It provides:

  • Local ONNX-based embedding inference
  • Default embedding model: all-MiniLM-L6-v2 (384 dimensions)
  • No external API calls needed

We recommend using Docker to deploy OceanBase:

docker run --name=oceanbase -e MODE=mini -e OB_SERVER_IP=127.0.0.1 -p 2881:2881 -d oceanbase/oceanbase-ce:latest

For AI Functions support, use OceanBase 4.4.1 or later:

docker run --name=oceanbase -e MODE=mini -e OB_SERVER_IP=127.0.0.1 -p 2881:2881 -d oceanbase/oceanbase-ce:4.4.1.0-100000032025101610

More methods to deploy OceanBase cluster

Usage

Documentation Formats

Choose your preferred format:

Additional Resources

Built-in Embedding Sections:

Hybrid Search Sections:

AI Functions Sections:

Quick Start

Using Built-in Embedding (No API Keys Required)

The simplest way to get started is using the built-in embedding function, which requires no API keys:

from langchain_oceanbase.vectorstores import OceanbaseVectorStore
from langchain_core.documents import Document

# Connection configuration
connection_args = {
    "host": "127.0.0.1",
    "port": "2881",
    "user": "root@test",
    "password": "",
    "db_name": "test",
}

# Use default embedding (set embedding_function=None)
vector_store = OceanbaseVectorStore(
    embedding_function=None,  # Automatically uses DefaultEmbeddingFunction
    table_name="langchain_vector",
    connection_args=connection_args,
    vidx_metric_type="l2",
    drop_old=True,
    embedding_dim=384,  # all-MiniLM-L6-v2 dimension
)

# Add documents
documents = [
    Document(page_content="Machine learning is a subset of artificial intelligence"),
    Document(page_content="Python is a popular programming language"),
    Document(page_content="OceanBase is a distributed relational database"),
]
ids = vector_store.add_documents(documents)

# Perform similarity search
results = vector_store.similarity_search("artificial intelligence", k=2)
for doc in results:
    print(f"* {doc.page_content}")

Key Benefits of Built-in Embedding:

  • ✅ No API keys or external services required
  • ✅ Works offline with local ONNX models
  • ✅ Fast batch processing
  • ✅ Perfect for prototyping and testing
  • ✅ Model files (~80MB) downloaded automatically on first use

Additional Quick Start Guides

About

This package contains the LangChain integration with OceanBase.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published