Skip to content

Support all vector databases #2415

@averikitsch

Description

@averikitsch

Prerequisites

What are you trying to do that currently feels hard or impossible?

We need to ensure that all vector-enable databases support embeddedBy:. Currently the blog only mentions PostgreSQL being supported.

Suggested Solution(s)

Add tests for all vector-enabled databases and ensure the vector is formatted and inserted correctly.

Databases Supporting Vectors

According to the source documentation and the nature of these integrations for GenAI, the following sources specifically support vector search or storage:

  • AlloyDB for PostgreSQL: Includes built-in support via the google_ml_integration and pgvector extensions for high-performance vector search.
  • Cloud SQL for PostgreSQL: Supports vector storage and search through the pgvector extension.
  • PostgreSQL: Standard PostgreSQL support via the pgvector extension.
  • Spanner: Supports vector search using the VECTOR_COSINE_DISTANCE and VECTOR_L2_DISTANCE functions and vector indexing.
  • BigQuery: Supports vector search via the VECTOR_SEARCH function and vector indexes.
  • Firestore: Supports vector search and K-nearest neighbor (KNN) queries on document fields.
  • MongoDB: Supports vector search through MongoDB Atlas Vector Search.
  • Elasticsearch: A native vector database supporting dense and sparse vector types.
  • Neo4j: Supports vector indexing and search via Cypher procedures.
  • SingleStore: Built-in vector data types and functions for dot product and Euclidean distance.
  • Redis / Valkey: Supports vector search using the RediSearch module (HNSW/Flat indexing).
  • ClickHouse: Supports vector search using specialized distance functions and experimental ANN (Approximate Nearest Neighbor) indexes.
  • Cassandra: Supports vector data types (since v5.0) and SAI (Storage-Attached Indexing).

Requirements for Enabling Vectors

To use vectors with these sources within the GenAI Toolbox, the following general and specific requirements must be met:

1. Database-Specific Requirements

  • PostgreSQL-based (AlloyDB, Cloud SQL, Self-hosted):
    • Extension: You must enable the pgvector extension by running CREATE EXTENSION IF NOT EXISTS vector; in your database.
    • Column Type: Use the VECTOR(dimensions) data type for your embedding columns.
  • Spanner:
    • Schema: Define columns with the ARRAY<FLOAT64> or ARRAY<FLOAT32> type.
    • Index: Create a search index specifically for vectors (using the VECTOR_INDEX syntax) to ensure performance.
  • BigQuery:
    • Index: You must create a Vector Index on your embedding column (which is typically an ARRAY<FLOAT64>).
    • Metadata: The VECTOR_SEARCH function requires a base table and a query table (or a single embedding).
  • Firestore:
    • Index: You must create a single-field index for the specific field containing the vector with the index type set to Vector.
  • MongoDB:
    • Deployment: Requires MongoDB Atlas (v6.0.11 or v7.0.2+).
    • Index: A "Vector Search Index" must be defined in the Atlas UI or via API.

Formats:

1. PostgreSQL / AlloyDB / Cloud SQL (Postgres)

  • Format: String
  • Syntax: '[0.1, 0.2, 0.3]'
  • Notes: Because the pgvector extension defines a custom vector type, literal values in SQL queries must be wrapped in single quotes and brackets to be cast correctly.

2. Google Cloud Spanner

  • Format: Array of Doubles/Floats
  • Syntax: [0.1, 0.2, 0.3]
  • Notes: Spanner uses the native ARRAY<FLOAT64> or ARRAY<FLOAT32> type. In SQL queries, these are passed as standard arrays without quotes.

3. BigQuery

  • Format: Array of Floats
  • Syntax: [0.1, 0.2, 0.3]
  • Notes: BigQuery expects an ARRAY<FLOAT64>. When using VECTOR_SEARCH, the query vector is typically passed as a literal array or a parameter of type array.

4. Google Cloud Firestore

  • Format: VectorValue Object (via SDK)
  • Syntax: VectorValue([0.1, 0.2, 0.3])
  • Notes: Firestore does not use SQL. When using the GenAI Toolbox/SDKs, vectors are passed as a native array/list which the library wraps into a VectorValue object for the document request.

5. MongoDB Atlas

  • Format: BSON Array
  • Syntax: [0.1, 0.2, 0.3]
  • Notes: Within an aggregation pipeline ($vectorSearch), the queryVector field is a standard JSON-style array of numbers.

6. Redis / Valkey

  • Format: Binary (Blob)
  • Syntax: A byte-array representation of float32/64 values.
  • Notes: Redis requires vectors to be stored and queried as raw binary data. Most libraries (and the Toolbox) handle the conversion from a standard list [0.1, 0.2...] to binary automatically.

7. Neo4j

  • Format: List of Floats
  • Syntax: [0.1, 0.2, 0.3]
  • Notes: In Cypher queries, vectors are treated as standard Neo4j Lists. For example: CALL db.index.vector.queryNodes('index_name', 10, [0.1, 0.2, 0.3]).

8. SingleStore

  • Format: JSON Array or Binary
  • Syntax: JSON_ARRAY_PACK('[0.1, 0.2, 0.3]')
  • Notes: SingleStore often uses JSON_ARRAY_PACK to convert a string-formatted array into a high-performance binary "blob" for vector operations.

9. Elasticsearch

  • Format: JSON Array
  • Syntax: [0.1, 0.2, 0.3]
  • Notes: The dense_vector field type accepts a standard JSON array of numbers.

Alternatives Considered

No response

Additional Details

No response

Metadata

Metadata

Assignees

Labels

priority: p1Important issue which blocks shipping the next release. Will be fixed prior to next release.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions