diff --git a/notebooks/official/generative_ai/voyage-4.ipynb b/notebooks/official/generative_ai/voyage-4.ipynb
new file mode 100644
index 000000000..f60195cfa
--- /dev/null
+++ b/notebooks/official/generative_ai/voyage-4.ipynb
@@ -0,0 +1,801 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cell-0",
+ "metadata": {
+ "id": "6b3283cdfd08"
+ },
+ "outputs": [],
+ "source": [
+ "# Copyright 2026 MongoDB, Inc\n",
+ "#\n",
+ "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "# you may not use this file except in compliance with the License.\n",
+ "# You may obtain a copy of the License at\n",
+ "#\n",
+ "# https://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing, software\n",
+ "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "# See the License for the specific language governing permissions and\n",
+ "# limitations under the License."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-1",
+ "metadata": {
+ "id": "d2b5eaffd266"
+ },
+ "source": [
+ "# Voyage 4 Embedding Models\n",
+ "\n",
+ "This notebook demonstrates how to deploy and use the Voyage 4 family of embedding models, featuring an **industry-first shared embedding space** that allows you to mix and match models for optimal cost and performance.\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ "  Open in Colab\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in Colab Enterprise\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in Workbench\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  View on GitHub\n",
+ " \n",
+ " | \n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-2",
+ "metadata": {
+ "id": "a46455330445"
+ },
+ "source": [
+ "## Overview\n",
+ "\n",
+ "The **Voyage 4** family introduces an **industry-first shared embedding space** across all model sizes. This means embeddings from any Voyage 4 model are **interchangeable**—you can encode documents with one model and queries with another, enabling optimal cost-performance trade-offs.\n",
+ "\n",
+ "### Key Features\n",
+ "\n",
+ "* **Shared Embedding Space**: All Voyage 4 models (large, standard, lite) produce compatible embeddings, so you can mix models for documents vs. queries\n",
+ "* **Matryoshka Representation Learning (MRL)**: Variable-dimension embeddings (256, 512, 1024, 2048) from the same model\n",
+ "* **Quantization-Aware Training (QAT)**: Optimized for int8, uint8, binary, and ubinary formats with minimal quality loss\n",
+ "* **Maximum 32K tokens input**: Support for long documents\n",
+ "\n",
+ "### Model Family\n",
+ "\n",
+ "| Model | Description | Best For |\n",
+ "| :--- | :--- | :--- |\n",
+ "| **voyage-4-large** | State-of-the-art general-purpose and multilingual embedding optimized for retrieval quality | Document embeddings where quality matters most |\n",
+ "| **voyage-4** | General-purpose multilingual embedding model optimized for retrieval/search and AI applications | Balanced cost/quality trade-off |\n",
+ "| **voyage-4-lite** | Lightweight general-purpose embedding model optimized for low latency and cost | Query embeddings and cost-sensitive applications |\n",
+ "\n",
+ "### What you'll learn\n",
+ "\n",
+ "In this notebook, you will:\n",
+ "\n",
+ "* Deploy a Voyage 4 model to a Vertex AI endpoint\n",
+ "* Generate embeddings and perform semantic similarity\n",
+ "* Explore advanced parameters (dimensions, quantization)\n",
+ "\n",
+ "### Costs\n",
+ "\n",
+ "This tutorial uses billable components of Google Cloud:\n",
+ "\n",
+ "* Vertex AI Model Garden\n",
+ "* Vertex AI Prediction endpoints\n",
+ "\n",
+ "Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-3",
+ "metadata": {
+ "id": "4d998a5140b2"
+ },
+ "source": [
+ "## Get started"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-4",
+ "metadata": {
+ "id": "b92cb16aea9c"
+ },
+ "source": [
+ "### Install Vertex AI SDK for Python and other required packages\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cell-5",
+ "metadata": {
+ "id": "030faea19be1"
+ },
+ "outputs": [],
+ "source": [
+ "! pip3 install --upgrade --quiet google-cloud-aiplatform numpy"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-6",
+ "metadata": {
+ "id": "848322ec177e"
+ },
+ "source": [
+ "### Restart runtime (Colab only)\n",
+ "\n",
+ "To use the newly installed packages, you must restart the runtime on Google Colab."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cell-7",
+ "metadata": {
+ "id": "b8d49bb74a53"
+ },
+ "outputs": [],
+ "source": [
+ "import sys\n",
+ "\n",
+ "if \"google.colab\" in sys.modules:\n",
+ "\n",
+ " import IPython\n",
+ "\n",
+ " app = IPython.Application.instance()\n",
+ " app.kernel.do_shutdown(True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-8",
+ "metadata": {
+ "id": "780490bfb862"
+ },
+ "source": [
+ "\n",
+ "⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️\n",
+ "
\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-9",
+ "metadata": {
+ "id": "1117fcd212f8"
+ },
+ "source": [
+ "### Authenticate your notebook environment (Colab only)\n",
+ "\n",
+ "Authenticate your environment on Google Colab.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cell-10",
+ "metadata": {
+ "id": "015bf6d5da75"
+ },
+ "outputs": [],
+ "source": [
+ "import sys\n",
+ "\n",
+ "if \"google.colab\" in sys.modules:\n",
+ "\n",
+ " from google.colab import auth\n",
+ "\n",
+ " auth.authenticate_user()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-11",
+ "metadata": {
+ "id": "722a10c66085"
+ },
+ "source": [
+ "### Set Google Cloud project information and initialize Vertex AI SDK for Python\n",
+ "\n",
+ "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cell-12",
+ "metadata": {
+ "id": "0f16c41d33fd"
+ },
+ "outputs": [],
+ "source": [
+ "# @title Setup Google Cloud project\n",
+ "\n",
+ "# Set your Google Cloud project ID and region below:\n",
+ "\n",
+ "import os\n",
+ "\n",
+ "import vertexai\n",
+ "\n",
+ "# @markdown Enter your project ID if not auto-detected:\n",
+ "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}\n",
+ "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n",
+ " PROJECT_ID = os.environ.get(\"GOOGLE_CLOUD_PROJECT\")\n",
+ "\n",
+ "# @markdown Select your region:\n",
+ "LOCATION = \"us-central1\" # @param [\"us-central1\", \"us-east1\", \"us-west1\", \"europe-west1\", \"europe-west4\", \"asia-east1\", \"asia-southeast1\"]\n",
+ "\n",
+ "print(f\"Project ID: {PROJECT_ID}\")\n",
+ "print(f\"Location: {LOCATION}\")\n",
+ "\n",
+ "vertexai.init(project=PROJECT_ID, location=LOCATION)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-13",
+ "metadata": {
+ "id": "4fa0f29e38cf"
+ },
+ "source": [
+ "## Deploy model\n",
+ "\n",
+ "The Voyage 4 family features a **shared embedding space**, meaning embeddings from any Voyage 4 model (large, standard, lite) are interchangeable. This allows you to use different models for documents and queries while maintaining compatibility.\n",
+ "\n",
+ "For this notebook, we'll deploy a single endpoint. The three models are:\n",
+ "\n",
+ "* **voyage-4-large** — State-of-the-art retrieval quality, ideal for document embeddings\n",
+ "* **voyage-4** — Balanced for retrieval/search and AI applications\n",
+ "* **voyage-4-lite** — Optimized for low latency and cost, ideal for query embeddings"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-14",
+ "metadata": {
+ "id": "45286665838d"
+ },
+ "source": [
+ "### Initialize the Model\n",
+ "\n",
+ "Initialize the Voyage 4 model from Model Garden.\n",
+ "\n",
+ "Use the `list_deploy_options()` method to view the verified deployment configurations for your selected model."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cell-15",
+ "metadata": {
+ "id": "7675f3b3f3de"
+ },
+ "outputs": [],
+ "source": [
+ "from vertexai import model_garden\n",
+ "\n",
+ "# @title Select Model\n",
+ "# @markdown Choose the Voyage 4 model to deploy:\n",
+ "MODEL = \"voyage-4\" # @param [\"voyage-4-large\", \"voyage-4\", \"voyage-4-lite\"]\n",
+ "\n",
+ "# Default to voyage-4 if not set\n",
+ "if not MODEL:\n",
+ " MODEL = \"voyage-4\"\n",
+ "\n",
+ "MODEL_NAME = f\"mongodb/{MODEL}@latest\"\n",
+ "model = model_garden.OpenModel(MODEL_NAME)\n",
+ "\n",
+ "# Set accelerator based on model (voyage-4-large requires 80GB GPU)\n",
+ "if MODEL == \"voyage-4-large\":\n",
+ " MACHINE_TYPE = \"a2-ultragpu-1g\"\n",
+ " ACCELERATOR_TYPE = \"NVIDIA_A100_80GB\"\n",
+ "else:\n",
+ " MACHINE_TYPE = \"a2-highgpu-1g\"\n",
+ " ACCELERATOR_TYPE = \"NVIDIA_TESLA_A100\"\n",
+ "\n",
+ "print(f\"Selected model: {MODEL_NAME}\")\n",
+ "print(f\"Accelerator: {ACCELERATOR_TYPE} on {MACHINE_TYPE}\")\n",
+ "deploy_options = model.list_deploy_options(concise=True)\n",
+ "print(deploy_options)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cell-16",
+ "metadata": {
+ "id": "4e40b2f2141f"
+ },
+ "outputs": [],
+ "source": [
+ "# @title Deploy or connect to endpoint\n",
+ "# @markdown Choose whether to deploy a new model or use an existing endpoint:\n",
+ "\n",
+ "deployment_option = \"deploy_new\" # @param [\"deploy_new\", \"use_existing\"]\n",
+ "\n",
+ "# @markdown ---\n",
+ "# @markdown If using existing endpoint, provide the endpoint ID:\n",
+ "ENDPOINT_ID = \"\" # @param {type:\"string\"}\n",
+ "\n",
+ "if deployment_option == \"deploy_new\":\n",
+ " print(f\"Deploying {MODEL}...\")\n",
+ " print(f\"Using {ACCELERATOR_TYPE} on {MACHINE_TYPE}\")\n",
+ " endpoint = model.deploy(\n",
+ " machine_type=MACHINE_TYPE,\n",
+ " accelerator_type=ACCELERATOR_TYPE,\n",
+ " accelerator_count=1,\n",
+ " accept_eula=True,\n",
+ " use_dedicated_endpoint=True,\n",
+ " )\n",
+ " print(f\"Endpoint deployed: {endpoint.display_name}\")\n",
+ " print(f\"Endpoint resource name: {endpoint.resource_name}\")\n",
+ "else:\n",
+ " if not ENDPOINT_ID:\n",
+ " raise ValueError(\"Please provide an ENDPOINT_ID when using existing endpoint\")\n",
+ "\n",
+ " from google.cloud import aiplatform\n",
+ "\n",
+ " print(f\"Connecting to existing endpoint: {ENDPOINT_ID}\")\n",
+ " endpoint = aiplatform.Endpoint(\n",
+ " endpoint_name=f\"projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/{ENDPOINT_ID}\"\n",
+ " )\n",
+ " print(f\"Using endpoint: {endpoint.display_name}\")\n",
+ " print(f\"Endpoint resource name: {endpoint.resource_name}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-20",
+ "metadata": {
+ "id": "0233e16ca856"
+ },
+ "source": [
+ "## Generate embeddings\n",
+ "\n",
+ "Now let's look at basic embedding generation with the Voyage 4 models."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cell-21",
+ "metadata": {
+ "id": "b7d228ce5961"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "# Multiple texts to embed\n",
+ "texts = [\n",
+ " \"Machine learning enables computers to learn from data.\",\n",
+ " \"Natural language processing helps computers understand human language.\",\n",
+ " \"Computer vision allows machines to interpret visual information.\",\n",
+ " \"Deep learning uses neural networks with multiple layers.\",\n",
+ "]\n",
+ "\n",
+ "# Prepare the batch request and make invoke call\n",
+ "body = {\"input\": texts, \"output_dimension\": 1024, \"input_type\": \"document\"}\n",
+ "response = endpoint.invoke(\n",
+ " request_path=\"/embeddings\",\n",
+ " body=json.dumps(body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "\n",
+ "# Extract embeddings\n",
+ "result = response.json()\n",
+ "embeddings = [item[\"embedding\"] for item in result[\"data\"]]\n",
+ "\n",
+ "print(f\"Number of texts embedded: {len(embeddings)}\")\n",
+ "print(f\"Embedding dimension: {len(embeddings[0])}\")\n",
+ "print(f\"\\nFirst embedding (first 5 values): {embeddings[0][:5]}\")\n",
+ "print(f\"Second embedding (first 5 values): {embeddings[1][:5]}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-22",
+ "metadata": {
+ "id": "a47938955d96"
+ },
+ "source": [
+ "### Semantic similarity\n",
+ "\n",
+ "Use embeddings to compute semantic similarity between text:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cell-23",
+ "metadata": {
+ "id": "d65defacbb93"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "import numpy as np\n",
+ "\n",
+ "\n",
+ "def cosine_similarity(vec1, vec2):\n",
+ " \"\"\"Calculate cosine similarity between two vectors.\"\"\"\n",
+ " vec1 = np.array(vec1)\n",
+ " vec2 = np.array(vec2)\n",
+ " return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))\n",
+ "\n",
+ "\n",
+ "# Example texts\n",
+ "query = \"How do computers learn from examples?\"\n",
+ "documents = [\n",
+ " \"Machine learning enables computers to learn from data.\",\n",
+ " \"The weather today is sunny and warm.\",\n",
+ " \"Neural networks can recognize patterns in data.\",\n",
+ " \"I enjoy cooking Italian food.\",\n",
+ "]\n",
+ "\n",
+ "# Get embeddings - using invoke with /embeddings endpoint\n",
+ "all_texts = [query] + documents\n",
+ "body = {\"input\": all_texts, \"output_dimension\": 1024, \"input_type\": \"document\"}\n",
+ "response = endpoint.invoke(\n",
+ " request_path=\"/embeddings\",\n",
+ " body=json.dumps(body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "result = response.json()\n",
+ "all_embeddings = [item[\"embedding\"] for item in result[\"data\"]]\n",
+ "\n",
+ "query_embedding = all_embeddings[0]\n",
+ "doc_embeddings = all_embeddings[1:]\n",
+ "\n",
+ "# Calculate similarities\n",
+ "print(f\"Query: {query}\\n\")\n",
+ "print(\"Similarity scores:\")\n",
+ "for i, doc in enumerate(documents):\n",
+ " similarity = cosine_similarity(query_embedding, doc_embeddings[i])\n",
+ " print(f\"{similarity:.4f} - {doc}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-26",
+ "metadata": {
+ "id": "c3b964fe9387"
+ },
+ "source": [
+ "## Advanced parameters\n",
+ "\n",
+ "Let's explore the advanced parameters that Voyage 4 models support to optimize your embeddings."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-27",
+ "metadata": {
+ "id": "865ef46609eb"
+ },
+ "source": [
+ "### Understanding input_type: Query vs Document\n",
+ "\n",
+ "The `input_type` parameter optimizes embeddings for retrieval tasks:\n",
+ "\n",
+ "* **`query`**: Use this when the text represents a search query or question. The model prepends \"Represent the query for retrieving supporting documents: \" to optimize for retrieval.\n",
+ "* **`document`**: Use this when the text represents a document or passage to be searched. The model prepends \"Represent the document for retrieval: \" to optimize for indexing.\n",
+ "* **`null`** (default): No special prompt is added. Use for general-purpose embeddings.\n",
+ "\n",
+ "**Best Practice**: For retrieval/search applications, use `input_type=\"query\"` for your search queries and `input_type=\"document\"` for the documents you're indexing."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cell-28",
+ "metadata": {
+ "id": "a12cf1ce0be7"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "# Example: Using input_type for retrieval\n",
+ "query_text = \"What is machine learning?\"\n",
+ "document_texts = [\n",
+ " \"Machine learning enables computers to learn from data.\",\n",
+ " \"Natural language processing helps computers understand human language.\",\n",
+ " \"Computer vision allows machines to interpret visual information.\",\n",
+ "]\n",
+ "\n",
+ "# Generate query embedding with input_type=\"query\"\n",
+ "query_body = {\n",
+ " \"input\": [query_text],\n",
+ " \"output_dimension\": 1024,\n",
+ " \"input_type\": \"query\", # Optimize for search queries\n",
+ "}\n",
+ "query_response = endpoint.invoke(\n",
+ " request_path=\"/embeddings\",\n",
+ " body=json.dumps(query_body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "query_result = query_response.json()\n",
+ "query_embedding = query_result[\"data\"][0][\"embedding\"]\n",
+ "\n",
+ "# Generate document embeddings with input_type=\"document\"\n",
+ "doc_body = {\n",
+ " \"input\": document_texts,\n",
+ " \"output_dimension\": 1024,\n",
+ " \"input_type\": \"document\", # Optimize for document indexing\n",
+ "}\n",
+ "doc_response = endpoint.invoke(\n",
+ " request_path=\"/embeddings\",\n",
+ " body=json.dumps(doc_body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "doc_result = doc_response.json()\n",
+ "doc_embeddings = [item[\"embedding\"] for item in doc_result[\"data\"]]\n",
+ "\n",
+ "print(f\"Query: {query_text}\")\n",
+ "print(f\"Query embedding dimension: {len(query_embedding)}\")\n",
+ "print(f\"\\nNumber of documents embedded: {len(doc_embeddings)}\")\n",
+ "print(f\"Document embedding dimension: {len(doc_embeddings[0])}\")\n",
+ "print(f\"\\nQuery embedding (first 5 values): {query_embedding[:5]}\")\n",
+ "print(f\"First document embedding (first 5 values): {doc_embeddings[0][:5]}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-29",
+ "metadata": {
+ "id": "055a68775299"
+ },
+ "source": [
+ "### Using different output dimensions (Matryoshka Representation Learning)\n",
+ "\n",
+ "Voyage 4 models support **Matryoshka Representation Learning (MRL)**, providing variable-dimension embeddings: 256, 512, 1024 (default), and 2048. Smaller dimensions reduce storage and computation costs, while larger dimensions may provide better accuracy."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cell-30",
+ "metadata": {
+ "id": "8e5ecdbb4d5c"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "text = \"Machine learning enables computers to learn from data.\"\n",
+ "\n",
+ "# Test different output dimensions\n",
+ "dimensions = [256, 512, 1024, 2048]\n",
+ "\n",
+ "print(\"Comparing different output dimensions (MRL):\\n\")\n",
+ "for dim in dimensions:\n",
+ " body = {\"input\": [text], \"output_dimension\": dim, \"input_type\": \"document\"}\n",
+ " response = endpoint.invoke(\n",
+ " request_path=\"/embeddings\",\n",
+ " body=json.dumps(body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ " )\n",
+ " result = response.json()\n",
+ " embedding = result[\"data\"][0][\"embedding\"]\n",
+ "\n",
+ " print(f\"Dimension {dim}:\")\n",
+ " print(f\" Length: {len(embedding)}\")\n",
+ " print(f\" First 5 values: {embedding[:5]}\")\n",
+ " print(f\" Storage size: ~{len(embedding) * 4} bytes (float32)\\n\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-31",
+ "metadata": {
+ "id": "2ea3386e2fbb"
+ },
+ "source": [
+ "### Using different output data types (Quantization-Aware Training)\n",
+ "\n",
+ "Voyage 4 models support **Quantization-Aware Training (QAT)**, optimizing embeddings for multiple output data types:\n",
+ "\n",
+ "* **`float`** (default): 32-bit floating-point numbers, highest precision\n",
+ "* **`int8`**: 8-bit signed integers (-128 to 127), 4x smaller than float\n",
+ "* **`uint8`**: 8-bit unsigned integers (0 to 255), 4x smaller than float\n",
+ "* **`binary`**: Bit-packed signed integers (int8), 32x smaller than float\n",
+ "* **`ubinary`**: Bit-packed unsigned integers (uint8), 32x smaller than float\n",
+ "\n",
+ "Quantized formats trade some precision for significant storage savings, with minimal quality loss thanks to QAT."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cell-32",
+ "metadata": {
+ "id": "3f7f5abccd85"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "text = \"Machine learning enables computers to learn from data.\"\n",
+ "\n",
+ "# Test different output data types\n",
+ "output_dtypes = [\"float\", \"int8\", \"uint8\", \"binary\", \"ubinary\"]\n",
+ "output_dimension = 1024\n",
+ "\n",
+ "print(\"Comparing different output data types (QAT):\\n\")\n",
+ "for dtype in output_dtypes:\n",
+ " body = {\n",
+ " \"input\": [text],\n",
+ " \"output_dimension\": output_dimension,\n",
+ " \"output_dtype\": dtype,\n",
+ " \"input_type\": \"document\",\n",
+ " }\n",
+ " response = endpoint.invoke(\n",
+ " request_path=\"/embeddings\",\n",
+ " body=json.dumps(body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ " )\n",
+ " result = response.json()\n",
+ " embedding = result[\"data\"][0][\"embedding\"]\n",
+ "\n",
+ " # Calculate actual storage size\n",
+ " if dtype == \"float\":\n",
+ " storage_bytes = len(embedding) * 4 # 4 bytes per float32\n",
+ " elif dtype in [\"int8\", \"uint8\"]:\n",
+ " storage_bytes = len(embedding) * 1 # 1 byte per int8/uint8\n",
+ " elif dtype in [\"binary\", \"ubinary\"]:\n",
+ " storage_bytes = len(embedding) * 1 # bit-packed, 1/8 of dimension\n",
+ "\n",
+ " print(f\"Output dtype: {dtype}\")\n",
+ " print(f\" Length: {len(embedding)}\")\n",
+ " print(f\" Value type: {type(embedding[0]).__name__}\")\n",
+ " print(f\" First 5 values: {embedding[:5]}\")\n",
+ " print(f\" Storage size: ~{storage_bytes} bytes\")\n",
+ "\n",
+ " # Calculate compression ratio vs float\n",
+ " if dtype != \"float\":\n",
+ " compression_ratio = (output_dimension * 4) / storage_bytes\n",
+ " print(f\" Compression: {compression_ratio:.1f}x smaller than float\")\n",
+ " print()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-33",
+ "metadata": {
+ "id": "130d571dd4a8"
+ },
+ "source": [
+ "### Combining output_dimension and output_dtype\n",
+ "\n",
+ "You can combine different dimensions and data types to optimize for your use case.\n",
+ "\n",
+ "Please refer to our guide for details on [offset binary](https://docs.voyageai.com/docs/flexible-dimensions-and-quantization#offset-binary) and [binary embeddings](https://docs.voyageai.com/docs/flexible-dimensions-and-quantization#quantization). "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cell-34",
+ "metadata": {
+ "id": "2738b2576859"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "text = \"Machine learning enables computers to learn from data.\"\n",
+ "\n",
+ "# Example: Ultra-compact embeddings (256 dimensions + ubinary)\n",
+ "compact_body = {\n",
+ " \"input\": [text],\n",
+ " \"output_dimension\": 256,\n",
+ " \"output_dtype\": \"ubinary\", # Most compact format\n",
+ " \"input_type\": \"document\",\n",
+ "}\n",
+ "compact_response = endpoint.invoke(\n",
+ " request_path=\"/embeddings\",\n",
+ " body=json.dumps(compact_body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "compact_result = compact_response.json()\n",
+ "compact_embedding = compact_result[\"data\"][0][\"embedding\"]\n",
+ "\n",
+ "# Example: High-precision embeddings (2048 dimensions + float)\n",
+ "precise_body = {\n",
+ " \"input\": [text],\n",
+ " \"output_dimension\": 2048,\n",
+ " \"output_dtype\": \"float\", # Highest precision\n",
+ " \"input_type\": \"document\",\n",
+ "}\n",
+ "precise_response = endpoint.invoke(\n",
+ " request_path=\"/embeddings\",\n",
+ " body=json.dumps(precise_body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "precise_result = precise_response.json()\n",
+ "precise_embedding = precise_result[\"data\"][0][\"embedding\"]\n",
+ "\n",
+ "# Compare storage requirements\n",
+ "compact_storage = len(compact_embedding) * 1 # binary is bit-packed\n",
+ "precise_storage = len(precise_embedding) * 4 # float32\n",
+ "\n",
+ "print(\"Storage comparison:\\n\")\n",
+ "print(\"Ultra-compact (256-dim ubinary):\")\n",
+ "print(\" Dimension: 256\")\n",
+ "print(f\" Storage: ~{compact_storage} bytes\")\n",
+ "print(f\" First 5 values: {compact_embedding[:5]}\\n\")\n",
+ "\n",
+ "print(\"High-precision (2048-dim float):\")\n",
+ "print(f\" Dimension: {len(precise_embedding)}\")\n",
+ "print(f\" Storage: ~{precise_storage} bytes\")\n",
+ "print(f\" First 5 values: {precise_embedding[:5]}\\n\")\n",
+ "\n",
+ "print(f\"Storage ratio: {precise_storage / compact_storage:.1f}x\")\n",
+ "print(\"\\nFor 1 million vectors:\")\n",
+ "print(f\" Ultra-compact: ~{compact_storage * 1_000_000 / (1024**2):.1f} MB\")\n",
+ "print(f\" High-precision: ~{precise_storage * 1_000_000 / (1024**2):.1f} MB\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-35",
+ "metadata": {
+ "id": "3067b6759d2b"
+ },
+ "source": [
+ "## Cleaning up\n",
+ "\n",
+ "To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, delete the endpoint."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cell-36",
+ "metadata": {
+ "id": "f59817e807ee"
+ },
+ "outputs": [],
+ "source": [
+ "# Delete the endpoint (this will also undeploy all models)\n",
+ "print(f\"Deleting endpoint: {endpoint.display_name}\")\n",
+ "endpoint.delete(force=True)\n",
+ "print(\"Endpoint deleted successfully!\")"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "name": "voyage-4.ipynb",
+ "toc_visible": true
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}