-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Support all vector databases #2415
Copy link
Copy link
Open
Labels
priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.Important issue which blocks shipping the next release. Will be fixed prior to next release.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.‘Nice-to-have’ improvement, new feature or different behavior or design.
Description
Prerequisites
- Search the current open issues
What are you trying to do that currently feels hard or impossible?
We need to ensure that all vector-enable databases support embeddedBy:. Currently the blog only mentions PostgreSQL being supported.
Suggested Solution(s)
Add tests for all vector-enabled databases and ensure the vector is formatted and inserted correctly.
Databases Supporting Vectors
According to the source documentation and the nature of these integrations for GenAI, the following sources specifically support vector search or storage:
- AlloyDB for PostgreSQL: Includes built-in support via the google_ml_integration and pgvector extensions for high-performance vector search.
- Cloud SQL for PostgreSQL: Supports vector storage and search through the pgvector extension.
- PostgreSQL: Standard PostgreSQL support via the pgvector extension.
- Spanner: Supports vector search using the VECTOR_COSINE_DISTANCE and VECTOR_L2_DISTANCE functions and vector indexing.
- BigQuery: Supports vector search via the VECTOR_SEARCH function and vector indexes.
- Firestore: Supports vector search and K-nearest neighbor (KNN) queries on document fields.
- MongoDB: Supports vector search through MongoDB Atlas Vector Search.
- Elasticsearch: A native vector database supporting dense and sparse vector types.
- Neo4j: Supports vector indexing and search via Cypher procedures.
- SingleStore: Built-in vector data types and functions for dot product and Euclidean distance.
- Redis / Valkey: Supports vector search using the RediSearch module (HNSW/Flat indexing).
- ClickHouse: Supports vector search using specialized distance functions and experimental ANN (Approximate Nearest Neighbor) indexes.
- Cassandra: Supports vector data types (since v5.0) and SAI (Storage-Attached Indexing).
Requirements for Enabling Vectors
To use vectors with these sources within the GenAI Toolbox, the following general and specific requirements must be met:
1. Database-Specific Requirements
- PostgreSQL-based (AlloyDB, Cloud SQL, Self-hosted):
- Extension: You must enable the pgvector extension by running CREATE EXTENSION IF NOT EXISTS vector; in your database.
- Column Type: Use the VECTOR(dimensions) data type for your embedding columns.
- Spanner:
- Schema: Define columns with the ARRAY<FLOAT64> or ARRAY<FLOAT32> type.
- Index: Create a search index specifically for vectors (using the VECTOR_INDEX syntax) to ensure performance.
- BigQuery:
- Index: You must create a Vector Index on your embedding column (which is typically an ARRAY<FLOAT64>).
- Metadata: The VECTOR_SEARCH function requires a base table and a query table (or a single embedding).
- Firestore:
- Index: You must create a single-field index for the specific field containing the vector with the index type set to Vector.
- MongoDB:
- Deployment: Requires MongoDB Atlas (v6.0.11 or v7.0.2+).
- Index: A "Vector Search Index" must be defined in the Atlas UI or via API.
Formats:
1. PostgreSQL / AlloyDB / Cloud SQL (Postgres)
- Format: String
- Syntax: '[0.1, 0.2, 0.3]'
- Notes: Because the pgvector extension defines a custom vector type, literal values in SQL queries must be wrapped in single quotes and brackets to be cast correctly.
2. Google Cloud Spanner
- Format: Array of Doubles/Floats
- Syntax: [0.1, 0.2, 0.3]
- Notes: Spanner uses the native ARRAY<FLOAT64> or ARRAY<FLOAT32> type. In SQL queries, these are passed as standard arrays without quotes.
3. BigQuery
- Format: Array of Floats
- Syntax: [0.1, 0.2, 0.3]
- Notes: BigQuery expects an ARRAY<FLOAT64>. When using VECTOR_SEARCH, the query vector is typically passed as a literal array or a parameter of type array.
4. Google Cloud Firestore
- Format: VectorValue Object (via SDK)
- Syntax: VectorValue([0.1, 0.2, 0.3])
- Notes: Firestore does not use SQL. When using the GenAI Toolbox/SDKs, vectors are passed as a native array/list which the library wraps into a VectorValue object for the document request.
5. MongoDB Atlas
- Format: BSON Array
- Syntax: [0.1, 0.2, 0.3]
- Notes: Within an aggregation pipeline ($vectorSearch), the queryVector field is a standard JSON-style array of numbers.
6. Redis / Valkey
- Format: Binary (Blob)
- Syntax: A byte-array representation of float32/64 values.
- Notes: Redis requires vectors to be stored and queried as raw binary data. Most libraries (and the Toolbox) handle the conversion from a standard list [0.1, 0.2...] to binary automatically.
7. Neo4j
- Format: List of Floats
- Syntax: [0.1, 0.2, 0.3]
- Notes: In Cypher queries, vectors are treated as standard Neo4j Lists. For example: CALL db.index.vector.queryNodes('index_name', 10, [0.1, 0.2, 0.3]).
8. SingleStore
- Format: JSON Array or Binary
- Syntax: JSON_ARRAY_PACK('[0.1, 0.2, 0.3]')
- Notes: SingleStore often uses JSON_ARRAY_PACK to convert a string-formatted array into a high-performance binary "blob" for vector operations.
9. Elasticsearch
- Format: JSON Array
- Syntax: [0.1, 0.2, 0.3]
- Notes: The dense_vector field type accepts a standard JSON array of numbers.
Alternatives Considered
No response
Additional Details
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.Important issue which blocks shipping the next release. Will be fixed prior to next release.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.‘Nice-to-have’ improvement, new feature or different behavior or design.