Open
Conversation
|
@HarshNaik0212 is attempting to deploy a commit to the Arc53 Team on Vercel. A member of the Team first needs to authorize it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Title
feat: Add Oracle 26ai Autonomous Database vector store (#981)
Description
Summary
Closes #981
This PR adds
OracleVectorStore— a new vector store backend for DocsGPT using Oracle 26ai Autonomous Database with nativeVECTORsupport. It follows the same architecture and interface as the existingPGVectorStorefor consistency.Architecture
The implementation uses a hybrid approach:
OracleVSadd_texts()operations (embedding generation and vector insertion).oracledbSQLget_chunks(),delete_index(), anddelete_chunk()for direct and efficient database operations.embed_query()search()to generate the query embedding and bypass LangChain’sEmbeddingsWrapperincompatibility.LangChain is used only where it genuinely simplifies things (table schema management, bulk inserts). Raw SQL is used for all filtering and deletion operations — the same direct approach as
PGVectorStore— avoiding LangChain abstraction limitations.Key Design Decisions
1. Why not pure LangChain?
DocsGPT's
EmbeddingsWrapperdoes not inherit from LangChain'sEmbeddingsbase class. When passed tosimilarity_search(query=...), LangChain mishandles it as a raw callable and produces a zero-dimension vector, causing:Fix: generate the query embedding ourselves via
embed_query()(same as pgvector) and pass the raw vector tosimilarity_search_by_vector()— LangChain skips re-embedding entirely.2. Why not pure raw SQL?
LangChain's
OracleVShandles the table creation and schema management cleanly. It also handles the embedding + INSERT in one batched call foradd_texts(), which is the most complex operation. This reduces boilerplate.3. Function-based index on
source_idLangChain stores metadata as a JSON CLOB column. Since there is no dedicated
source_idcolumn (unlike pgvector), a function-based index onJSON_VALUE(metadata, '$.source_id')is created automatically on first init — giving equivalent filtering performance to pgvector's B-tree index onsource_id.4. Wallet-based connection for Oracle 26ai Free Tier
Oracle Autonomous Database requires mTLS via a wallet. Credentials are passed as individual parameters (
config_dir,wallet_location,wallet_password) tooracledb.connect()— a single connection string cannot carry all wallet params.5. Multi-tenant safe
All operations (
search,get_chunks,delete_index,delete_chunk) are scoped tosource_id—delete_index()deletes only rows for that source, never the entire table.Files Changed
application/vectorstore/oracle.py— new vector store implementationapplication/vectorstore/vector_creator.py— registered"oracle"as a valid store typeapplication/core/settings.py— added 5 Oracle config fields.env.oracle.example— reference config file for Oracle setuptest_oraclevectorstore.py— manual integration testOracle 26ai Free Tier — Quick Setup
1. Create account & database
docsgpt) and ADMIN password2. Download wallet
C:\Wallet_docsgptor/app/wallet)3. Get your DSN name
tnsnames.orainside the wallet folder_high(e.g.docsgpt_high)4. Configure
.env5. Install dependencies
Table and indexes are created automatically on first run.
Testing
Tested against Oracle 26ai Autonomous Database Free Tier. Integration test suite (
tests/test_oraclevectorstore.py) covers:source_idindex creationadd_texts()batch insertadd_chunk()single insertget_chunks()with metadata parsing (handles dict/CLOB/string)search()cosine similarity viasimilarity_search_by_vector()delete_chunk()with double-delete guarddelete_index()— rows deleted, table preservedResult: 10/10 tests passing