Skip to content

Feat/oracle vectorstore#2292

Open
HarshNaik0212 wants to merge 3 commits intoarc53:mainfrom
HarshNaik0212:feat/oracle-vectorstore
Open

Feat/oracle vectorstore#2292
HarshNaik0212 wants to merge 3 commits intoarc53:mainfrom
HarshNaik0212:feat/oracle-vectorstore

Conversation

@HarshNaik0212
Copy link

Title

feat: Add Oracle 26ai Autonomous Database vector store (#981)


Description

Summary

Closes #981

This PR adds OracleVectorStore — a new vector store backend for DocsGPT using Oracle 26ai Autonomous Database with native VECTOR support. It follows the same architecture and interface as the existing PGVectorStore for consistency.


Architecture

The implementation uses a hybrid approach:

Layer Responsibility
LangChain OracleVS Handles table creation and add_texts() operations (embedding generation and vector insertion).
Raw oracledb SQL Implements get_chunks(), delete_index(), and delete_chunk() for direct and efficient database operations.
Manual embed_query() Used in search() to generate the query embedding and bypass LangChain’s EmbeddingsWrapper incompatibility.

LangChain is used only where it genuinely simplifies things (table schema management, bulk inserts). Raw SQL is used for all filtering and deletion operations — the same direct approach as PGVectorStore — avoiding LangChain abstraction limitations.


Key Design Decisions

1. Why not pure LangChain?
DocsGPT's EmbeddingsWrapper does not inherit from LangChain's Embeddings base class. When passed to similarity_search(query=...), LangChain mishandles it as a raw callable and produces a zero-dimension vector, causing:

DPY-4031: vector cannot contain zero dimensions

Fix: generate the query embedding ourselves via embed_query() (same as pgvector) and pass the raw vector to similarity_search_by_vector() — LangChain skips re-embedding entirely.

2. Why not pure raw SQL?
LangChain's OracleVS handles the table creation and schema management cleanly. It also handles the embedding + INSERT in one batched call for add_texts(), which is the most complex operation. This reduces boilerplate.

3. Function-based index on source_id
LangChain stores metadata as a JSON CLOB column. Since there is no dedicated source_id column (unlike pgvector), a function-based index on JSON_VALUE(metadata, '$.source_id') is created automatically on first init — giving equivalent filtering performance to pgvector's B-tree index on source_id.

4. Wallet-based connection for Oracle 26ai Free Tier
Oracle Autonomous Database requires mTLS via a wallet. Credentials are passed as individual parameters (config_dir, wallet_location, wallet_password) to oracledb.connect() — a single connection string cannot carry all wallet params.

5. Multi-tenant safe
All operations (search, get_chunks, delete_index, delete_chunk) are scoped to source_iddelete_index() deletes only rows for that source, never the entire table.


Files Changed

  • application/vectorstore/oracle.py — new vector store implementation
  • application/vectorstore/vector_creator.py — registered "oracle" as a valid store type
  • application/core/settings.py — added 5 Oracle config fields
  • .env.oracle.example — reference config file for Oracle setup
  • test_oraclevectorstore.py — manual integration test

Oracle 26ai Free Tier — Quick Setup

1. Create account & database

2. Download wallet

  • Open your database → click Database Connection
  • Click Download Wallet → set a wallet password → download ZIP
  • Unzip to a folder (e.g. C:\Wallet_docsgpt or /app/wallet)

3. Get your DSN name

  • Open tnsnames.ora inside the wallet folder
  • Note the connection name ending in _high (e.g. docsgpt_high)

4. Configure .env

VECTOR_STORE=oracle
ORACLE_USER=ADMIN
ORACLE_PASSWORD=your_admin_password
ORACLE_DSN=docsgpt_high
ORACLE_WALLET_LOCATION=/path/to/wallet
ORACLE_WALLET_PASSWORD=your_wallet_password

5. Install dependencies

pip install oracledb langchain-community langchain-core

Table and indexes are created automatically on first run.


Testing

Tested against Oracle 26ai Autonomous Database Free Tier. Integration test suite (tests/test_oraclevectorstore.py) covers:

  • Wallet-based connection
  • Table auto-creation by LangChain
  • Function-based source_id index creation
  • add_texts() batch insert
  • add_chunk() single insert
  • get_chunks() with metadata parsing (handles dict/CLOB/string)
  • search() cosine similarity via similarity_search_by_vector()
  • Multi-tenant isolation — Source A cannot see Source B's data
  • delete_chunk() with double-delete guard
  • delete_index() — rows deleted, table preserved

Result: 10/10 tests passing

@vercel
Copy link

vercel bot commented Mar 9, 2026

@HarshNaik0212 is attempting to deploy a commit to the Arc53 Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions github-actions bot added repo application Application labels Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Using Oracle Vector store

1 participant