langchain-oceanbase

This package contains the LangChain integration with OceanBase.

OceanBase Database is a distributed relational database. It is developed entirely by Ant Group. The OceanBase Database is built on a common server cluster. Based on the Paxos protocol and its distributed structure, the OceanBase Database provides high availability and linear scalability.

OceanBase currently has the ability to store vectors. Users can easily perform the following operations with SQL:

Create a table containing vector type fields;
Create a vector index table based on the HNSW algorithm;
Perform vector approximate nearest neighbor queries;
...

Features

Built-in Embedding: Built-in embedding function using all-MiniLM-L6-v2 model (384 dimensions) with no API keys required. Perfect for quick prototyping and local development.
- No API Keys Required: Uses local ONNX models, no external API calls needed
- Quick Start: Perfect for rapid prototyping and testing
- LangChain Compatible: Fully compatible with LangChain's Embeddings interface
- Batch Processing: Supports efficient batch embedding generation
- Automatic Integration: Can be automatically used in OceanbaseVectorStore by setting embedding_function=None
- Technical Specs: Model all-MiniLM-L6-v2, 384 dimensions, ONNX Runtime inference
Vector Storage: Store embeddings from any LangChain embedding model in OceanBase with automatic table creation and index management.
Similarity Search: Perform efficient similarity searches on vector data with multiple distance metrics (L2, cosine, inner product).
Hybrid Search: Combine vector search with sparse vector search and full-text search for improved results with configurable weights.
Maximal Marginal Relevance: Filter for diversity in search results to avoid redundant information.
Multiple Index Types: Support for HNSW, IVF, FLAT and other vector index types with automatic parameter optimization.
Sparse Embeddings: Native support for sparse vector embeddings with BM25-like functionality.
Advanced Filtering: Built-in support for metadata filtering and complex query conditions.
Async Support: Full support for async operations and high-concurrency scenarios.

Installation

pip install -U langchain-oceanbase

Requirements

Python >=3.11
langchain-core >=1.0.0
pyobvector >=0.2.0 (required for database client)
pyseekdb >=0.1.0 (optional, for built-in embedding functionality)

Tip: The current version supports langchain-core >=1.0.0

Platform Support

✅ Linux: Full support (x86_64 and ARM64)
✅ macOS/Windows: Supported - pyobvector works on all platforms

Built-in Embedding Dependencies

For built-in embedding functionality (no API keys required), pyseekdb is automatically installed as an optional dependency. It provides:

Local ONNX-based embedding inference
Default embedding model: all-MiniLM-L6-v2 (384 dimensions)
No external API calls needed

We recommend using Docker to deploy OceanBase:

docker run --name=oceanbase -e MODE=mini -e OB_SERVER_IP=127.0.0.1 -p 2881:2881 -d oceanbase/oceanbase-ce:latest

For AI Functions support, use OceanBase 4.4.1 or later:

docker run --name=oceanbase -e MODE=mini -e OB_SERVER_IP=127.0.0.1 -p 2881:2881 -d oceanbase/oceanbase-ce:4.4.1.0-100000032025101610

More methods to deploy OceanBase cluster

Usage

Documentation Formats

Choose your preferred format:

Jupyter Notebook - Interactive notebook with executable code cells
Markdown - Static documentation for easy reading

Additional Resources

Built-in Embedding Guide - Interactive notebook for built-in embedding functionality
Built-in Embedding Guide (Markdown) - Static documentation for built-in embeddings
Hybrid Search Guide - Interactive notebook for hybrid search features
Hybrid Search Guide (Markdown) - Static documentation for hybrid search
AI Functions Guide - Documentation for AI Functions (AI_EMBED, AI_COMPLETE, AI_RERANK)
AI Functions Guide (Notebook) - Interactive notebook for AI Functions

Built-in Embedding Sections:

Installation - Install required packages
Direct Use - Use DefaultEmbeddingFunction directly
LangChain Compatible - Use DefaultEmbeddingFunctionAdapter
Vector Store Integration - Use in OceanbaseVectorStore
Text Similarity - Compute similarity between texts
Performance - Batch vs single processing comparison

Hybrid Search Sections:

Setup - Deploy OceanBase and install packages
Vector Search - Semantic similarity matching
Sparse Vector Search - Keyword-based exact matching
Full-text Search - Content-based text search
Multi-modal Search - Combined search strategies

AI Functions Sections:

Setup - Deploy OceanBase and configure AI models
Initialization - Configure and create AI functions client
AI_EMBED - Convert text to vector embeddings
AI_COMPLETE - Generate text completions
AI_RERANK - Rerank search results
Model Configuration API - Setup AI models and endpoints

Quick Start

Using Built-in Embedding (No API Keys Required)

The simplest way to get started is using the built-in embedding function, which requires no API keys:

from langchain_oceanbase.vectorstores import OceanbaseVectorStore
from langchain_core.documents import Document

# Connection configuration
connection_args = {
    "host": "127.0.0.1",
    "port": "2881",
    "user": "root@test",
    "password": "",
    "db_name": "test",
}

# Use default embedding (set embedding_function=None)
vector_store = OceanbaseVectorStore(
    embedding_function=None,  # Automatically uses DefaultEmbeddingFunction
    table_name="langchain_vector",
    connection_args=connection_args,
    vidx_metric_type="l2",
    drop_old=True,
    embedding_dim=384,  # all-MiniLM-L6-v2 dimension
)

# Add documents
documents = [
    Document(page_content="Machine learning is a subset of artificial intelligence"),
    Document(page_content="Python is a popular programming language"),
    Document(page_content="OceanBase is a distributed relational database"),
]
ids = vector_store.add_documents(documents)

# Perform similarity search
results = vector_store.similarity_search("artificial intelligence", k=2)
for doc in results:
    print(f"* {doc.page_content}")

Key Benefits of Built-in Embedding:

✅ No API keys or external services required
✅ Works offline with local ONNX models
✅ Fast batch processing
✅ Perfect for prototyping and testing
✅ Model files (~80MB) downloaded automatically on first use

Additional Quick Start Guides

Setup - Deploy OceanBase and install dependencies
Initialization - Configure and create vector store
Manage vector store - Add, update, and delete vectors
Query vector store - Search and retrieve vectors
Build RAG(Retrieval Augmented Generation) - Build powerful RAG applications
Full-text Search - Implement full-text search capabilities
Hybrid Search - Combine vector and text search for better results
Advanced Filtering - Metadata filtering and complex query conditions
Maximal Marginal Relevance - Filter for diversity in search results
Multiple Index Types - Different vector index types (HNSW, IVF, FLAT)

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
docs		docs
langchain_oceanbase		langchain_oceanbase
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

langchain-oceanbase

Features

Installation

Requirements

Platform Support

Built-in Embedding Dependencies

Usage

Documentation Formats

Additional Resources

Built-in Embedding Sections:

Hybrid Search Sections:

AI Functions Sections:

Quick Start

Using Built-in Embedding (No API Keys Required)

Additional Quick Start Guides

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors 6

Languages

License

oceanbase/langchain-oceanbase

Folders and files

Latest commit

History

Repository files navigation

langchain-oceanbase

Features

Installation

Requirements

Platform Support

Built-in Embedding Dependencies

Usage

Documentation Formats

Additional Resources

Built-in Embedding Sections:

Hybrid Search Sections:

AI Functions Sections:

Quick Start

Using Built-in Embedding (No API Keys Required)

Additional Quick Start Guides

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors 6

Languages

Packages