Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

Vertex AI Vector Search Module

Provisions a Vertex AI Vector Search (Matching Engine) index with an endpoint and deployed index, optionally with a GCS bucket for durable embedding storage.

Designed for RAG (Retrieval-Augmented Generation) workloads where teams need to store and retrieve vector embeddings for semantic search.

For the Go client library that connects to the infrastructure this module provisions, see the RAG library documentation.

Features

  • Creates a Vertex AI Vector Search index with configurable dimensions, distance measure, and Tree-AH algorithm parameters
  • Deploys the index to a public endpoint with configurable machine type and replica count
  • Optionally creates a GCS bucket for durable embedding storage (dual-write for re-embedding on model upgrades)
  • IAM bindings for authorized service accounts (roles/aiplatform.user + GCS access)
  • Consistent labeling with team, product, and custom labels

Usage example

module "build_failures_index" {
  source = "github.com/chainguard-dev/terraform-infra-reconcilers//modules/vertex-ai-vector-search"

  name       = "build-failures"
  project    = "my-gcp-project"
  region     = "us-central1"
  team       = "eng-sus-tools"
  dimensions = 3072 # gemini-embedding-001

  authorized_service_accounts = [
    google_service_account.rag_mcp.email,
  ]
}

# Use outputs in your MCP server CORPORA_CONFIG:
# module.build_failures_index.index_id
# module.build_failures_index.deployed_index_id
# module.build_failures_index.public_endpoint_domain_name

One index per corpus

Each embedding model produces vectors with a specific dimensionality (e.g., 3072 for gemini-embedding-001, 768 for text-embedding-005). Since a Matching Engine index is configured with fixed dimensions, each corpus that uses a different embedding model requires its own index.

Call this module once per corpus:

module "build_failures" {
  source     = "github.com/chainguard-dev/terraform-infra-reconcilers//modules/vertex-ai-vector-search"
  name       = "build-failures"
  dimensions = 3072 # gemini-embedding-001
  # ...
}

module "advisories" {
  source     = "github.com/chainguard-dev/terraform-infra-reconcilers//modules/vertex-ai-vector-search"
  name       = "advisories"
  dimensions = 768 # text-embedding-005
  # ...
}

If multiple corpora share the same model and dimensions, you can use a single index with Vertex AI restrict tokens to partition data by corpus at the application level.

Requirements

No requirements.

Providers

Name Version
google n/a

Modules

No modules.

Resources

Name Type
google_project_iam_member.aiplatform_user resource
google_storage_bucket.embeddings resource
google_storage_bucket_iam_member.gcs_writer resource
google_vertex_ai_index.this resource
google_vertex_ai_index_endpoint.this resource
google_vertex_ai_index_endpoint_deployed_index.this resource

Inputs

Name Description Type Default Required
approximate_neighbors_count Default number of approximate neighbors to return during search. number 150 no
authorized_service_accounts List of Google service account emails to grant roles/aiplatform.user and GCS access. list(string) [] no
create_gcs_bucket Create a GCS bucket for durable embedding storage. Set to false to bring your own bucket. bool true no
deletion_protection When true, prevents the GCS bucket from being destroyed with objects in it. bool true no
description Human-readable description for the index. string "" no
dimensions Number of dimensions for embedding vectors. Must match the embedding model (e.g. 3072 for gemini-embedding-001, 768 for text-embedding-005). number n/a yes
distance_measure_type Distance measure for vector similarity. One of: COSINE_DISTANCE, SQUARED_L2_DISTANCE, L1_DISTANCE, DOT_PRODUCT_DISTANCE. string "COSINE_DISTANCE" no
feature_norm_type Feature normalization type. Use UNIT_L2_NORM with COSINE_DISTANCE for best results. string "UNIT_L2_NORM" no
gcs_bucket_name Name for the GCS bucket. Defaults to '{project}-{name}-embeddings' when create_gcs_bucket is true. When create_gcs_bucket is false, the caller is responsible for managing IAM on their own bucket. string "" no
gcs_lifecycle_age_days Number of days before objects in the embeddings bucket are deleted. Set to 0 to disable lifecycle rules. number 0 no
labels Additional labels to apply to resources. map(string) {} no
leaf_node_embedding_count Number of embeddings per leaf node in the Tree-AH index. More embeddings per leaf = smaller index but slower search. number 1000 no
leaf_nodes_to_search_percent Percentage of leaf nodes to search (1-100). Higher = better recall, slower search. number 10 no
machine_type Machine type for serving the deployed index. string "e2-standard-16" no
max_replica_count Maximum number of replicas for the deployed index. number 1 no
min_replica_count Minimum number of replicas for the deployed index. number 1 no
name Base name for all resources (index, endpoint, bucket). Lowercase letters, numbers, and hyphens. string n/a yes
product Product label to apply to resources. string "unknown" no
project GCP project ID. string n/a yes
region GCP region for the index and endpoint. string n/a yes
team Team label to apply to resources (replaces deprecated 'squad'). string n/a yes

Outputs

Name Description
deployed_index_id ID of the deployed index within the endpoint.
gcs_bucket_name Name of the GCS bucket for embedding storage. Empty if create_gcs_bucket is false.
index_endpoint_id Fully-qualified resource name of the index endpoint.
index_id Fully-qualified resource name of the Vertex AI index.
public_endpoint_domain_name Public domain name for gRPC queries to the deployed index.