diff --git a/notebooks/official/generative_ai/voyage-4.ipynb b/notebooks/official/generative_ai/voyage-4.ipynb
new file mode 100644
index 000000000..f60195cfa
--- /dev/null
+++ b/notebooks/official/generative_ai/voyage-4.ipynb
@@ -0,0 +1,801 @@
+{
+  "cells": [
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "cell-0",
+      "metadata": {
+        "id": "6b3283cdfd08"
+      },
+      "outputs": [],
+      "source": [
+        "# Copyright 2026 MongoDB, Inc\n",
+        "#\n",
+        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+        "# you may not use this file except in compliance with the License.\n",
+        "# You may obtain a copy of the License at\n",
+        "#\n",
+        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing, software\n",
+        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+        "# See the License for the specific language governing permissions and\n",
+        "# limitations under the License."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-1",
+      "metadata": {
+        "id": "d2b5eaffd266"
+      },
+      "source": [
+        "# Voyage 4 Embedding Models\n",
+        "\n",
+        "This notebook demonstrates how to deploy and use the Voyage 4 family of embedding models, featuring an **industry-first shared embedding space** that allows you to mix and match models for optimal cost and performance.\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td style=\"text-align: center\">\n",
+        "    <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/generative_ai/voyage-4.ipynb\">\n",
+        "      <img src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n",
+        "    </a>\n",
+        "  </td>\n",
+        "  <td style=\"text-align: center\">\n",
+        "    <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fofficial%2Fgenerative_ai%2Fvoyage-4.ipynb\">\n",
+        "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n",
+        "    </a>\n",
+        "  </td>\n",
+        "  <td style=\"text-align: center\">\n",
+        "    <a href=\"https://console.cloud.google.com/vertex-ai/notebooks/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/generative_ai/voyage-4.ipynb\">\n",
+        "      <img src=\"https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg\" alt=\"Vertex AI logo\"><br> Open in Workbench\n",
+        "    </a>\n",
+        "  </td>\n",
+        "  <td style=\"text-align: center\">\n",
+        "    <a href=\"https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/generative_ai/voyage-4.ipynb\">\n",
+        "      <img src=\"https://github.githubassets.com/assets/GitHub-Mark-ea2971cee799.png\" alt=\"GitHub logo\" width=\"32px\"><br> View on GitHub\n",
+        "    </a>\n",
+        "  </td>\n",
+        "</table>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-2",
+      "metadata": {
+        "id": "a46455330445"
+      },
+      "source": [
+        "## Overview\n",
+        "\n",
+        "The **Voyage 4** family introduces an **industry-first shared embedding space** across all model sizes. This means embeddings from any Voyage 4 model are **interchangeable**—you can encode documents with one model and queries with another, enabling optimal cost-performance trade-offs.\n",
+        "\n",
+        "### Key Features\n",
+        "\n",
+        "* **Shared Embedding Space**: All Voyage 4 models (large, standard, lite) produce compatible embeddings, so you can mix models for documents vs. queries\n",
+        "* **Matryoshka Representation Learning (MRL)**: Variable-dimension embeddings (256, 512, 1024, 2048) from the same model\n",
+        "* **Quantization-Aware Training (QAT)**: Optimized for int8, uint8, binary, and ubinary formats with minimal quality loss\n",
+        "* **Maximum 32K tokens input**: Support for long documents\n",
+        "\n",
+        "### Model Family\n",
+        "\n",
+        "| Model | Description | Best For |\n",
+        "| :--- | :--- | :--- |\n",
+        "| **voyage-4-large** | State-of-the-art general-purpose and multilingual embedding optimized for retrieval quality | Document embeddings where quality matters most |\n",
+        "| **voyage-4** | General-purpose multilingual embedding model optimized for retrieval/search and AI applications | Balanced cost/quality trade-off |\n",
+        "| **voyage-4-lite** | Lightweight general-purpose embedding model optimized for low latency and cost | Query embeddings and cost-sensitive applications |\n",
+        "\n",
+        "### What you'll learn\n",
+        "\n",
+        "In this notebook, you will:\n",
+        "\n",
+        "* Deploy a Voyage 4 model to a Vertex AI endpoint\n",
+        "* Generate embeddings and perform semantic similarity\n",
+        "* Explore advanced parameters (dimensions, quantization)\n",
+        "\n",
+        "### Costs\n",
+        "\n",
+        "This tutorial uses billable components of Google Cloud:\n",
+        "\n",
+        "* Vertex AI Model Garden\n",
+        "* Vertex AI Prediction endpoints\n",
+        "\n",
+        "Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-3",
+      "metadata": {
+        "id": "4d998a5140b2"
+      },
+      "source": [
+        "## Get started"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-4",
+      "metadata": {
+        "id": "b92cb16aea9c"
+      },
+      "source": [
+        "### Install Vertex AI SDK for Python and other required packages\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "cell-5",
+      "metadata": {
+        "id": "030faea19be1"
+      },
+      "outputs": [],
+      "source": [
+        "! pip3 install --upgrade --quiet google-cloud-aiplatform numpy"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-6",
+      "metadata": {
+        "id": "848322ec177e"
+      },
+      "source": [
+        "### Restart runtime (Colab only)\n",
+        "\n",
+        "To use the newly installed packages, you must restart the runtime on Google Colab."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "cell-7",
+      "metadata": {
+        "id": "b8d49bb74a53"
+      },
+      "outputs": [],
+      "source": [
+        "import sys\n",
+        "\n",
+        "if \"google.colab\" in sys.modules:\n",
+        "\n",
+        "    import IPython\n",
+        "\n",
+        "    app = IPython.Application.instance()\n",
+        "    app.kernel.do_shutdown(True)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-8",
+      "metadata": {
+        "id": "780490bfb862"
+      },
+      "source": [
+        "<div class=\"alert alert-block alert-warning\">\n",
+        "<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>\n",
+        "</div>\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-9",
+      "metadata": {
+        "id": "1117fcd212f8"
+      },
+      "source": [
+        "### Authenticate your notebook environment (Colab only)\n",
+        "\n",
+        "Authenticate your environment on Google Colab.\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "cell-10",
+      "metadata": {
+        "id": "015bf6d5da75"
+      },
+      "outputs": [],
+      "source": [
+        "import sys\n",
+        "\n",
+        "if \"google.colab\" in sys.modules:\n",
+        "\n",
+        "    from google.colab import auth\n",
+        "\n",
+        "    auth.authenticate_user()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-11",
+      "metadata": {
+        "id": "722a10c66085"
+      },
+      "source": [
+        "### Set Google Cloud project information and initialize Vertex AI SDK for Python\n",
+        "\n",
+        "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "cell-12",
+      "metadata": {
+        "id": "0f16c41d33fd"
+      },
+      "outputs": [],
+      "source": [
+        "# @title Setup Google Cloud project\n",
+        "\n",
+        "# Set your Google Cloud project ID and region below:\n",
+        "\n",
+        "import os\n",
+        "\n",
+        "import vertexai\n",
+        "\n",
+        "# @markdown Enter your project ID if not auto-detected:\n",
+        "PROJECT_ID = \"[your-project-id]\"  # @param {type:\"string\"}\n",
+        "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n",
+        "    PROJECT_ID = os.environ.get(\"GOOGLE_CLOUD_PROJECT\")\n",
+        "\n",
+        "# @markdown Select your region:\n",
+        "LOCATION = \"us-central1\"  # @param [\"us-central1\", \"us-east1\", \"us-west1\", \"europe-west1\", \"europe-west4\", \"asia-east1\", \"asia-southeast1\"]\n",
+        "\n",
+        "print(f\"Project ID: {PROJECT_ID}\")\n",
+        "print(f\"Location: {LOCATION}\")\n",
+        "\n",
+        "vertexai.init(project=PROJECT_ID, location=LOCATION)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-13",
+      "metadata": {
+        "id": "4fa0f29e38cf"
+      },
+      "source": [
+        "## Deploy model\n",
+        "\n",
+        "The Voyage 4 family features a **shared embedding space**, meaning embeddings from any Voyage 4 model (large, standard, lite) are interchangeable. This allows you to use different models for documents and queries while maintaining compatibility.\n",
+        "\n",
+        "For this notebook, we'll deploy a single endpoint. The three models are:\n",
+        "\n",
+        "* **voyage-4-large** — State-of-the-art retrieval quality, ideal for document embeddings\n",
+        "* **voyage-4** — Balanced for retrieval/search and AI applications\n",
+        "* **voyage-4-lite** — Optimized for low latency and cost, ideal for query embeddings"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-14",
+      "metadata": {
+        "id": "45286665838d"
+      },
+      "source": [
+        "### Initialize the Model\n",
+        "\n",
+        "Initialize the Voyage 4 model from Model Garden.\n",
+        "\n",
+        "Use the `list_deploy_options()` method to view the verified deployment configurations for your selected model."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "cell-15",
+      "metadata": {
+        "id": "7675f3b3f3de"
+      },
+      "outputs": [],
+      "source": [
+        "from vertexai import model_garden\n",
+        "\n",
+        "# @title Select Model\n",
+        "# @markdown Choose the Voyage 4 model to deploy:\n",
+        "MODEL = \"voyage-4\"  # @param [\"voyage-4-large\", \"voyage-4\", \"voyage-4-lite\"]\n",
+        "\n",
+        "# Default to voyage-4 if not set\n",
+        "if not MODEL:\n",
+        "    MODEL = \"voyage-4\"\n",
+        "\n",
+        "MODEL_NAME = f\"mongodb/{MODEL}@latest\"\n",
+        "model = model_garden.OpenModel(MODEL_NAME)\n",
+        "\n",
+        "# Set accelerator based on model (voyage-4-large requires 80GB GPU)\n",
+        "if MODEL == \"voyage-4-large\":\n",
+        "    MACHINE_TYPE = \"a2-ultragpu-1g\"\n",
+        "    ACCELERATOR_TYPE = \"NVIDIA_A100_80GB\"\n",
+        "else:\n",
+        "    MACHINE_TYPE = \"a2-highgpu-1g\"\n",
+        "    ACCELERATOR_TYPE = \"NVIDIA_TESLA_A100\"\n",
+        "\n",
+        "print(f\"Selected model: {MODEL_NAME}\")\n",
+        "print(f\"Accelerator: {ACCELERATOR_TYPE} on {MACHINE_TYPE}\")\n",
+        "deploy_options = model.list_deploy_options(concise=True)\n",
+        "print(deploy_options)"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "cell-16",
+      "metadata": {
+        "id": "4e40b2f2141f"
+      },
+      "outputs": [],
+      "source": [
+        "# @title Deploy or connect to endpoint\n",
+        "# @markdown Choose whether to deploy a new model or use an existing endpoint:\n",
+        "\n",
+        "deployment_option = \"deploy_new\"  # @param [\"deploy_new\", \"use_existing\"]\n",
+        "\n",
+        "# @markdown ---\n",
+        "# @markdown If using existing endpoint, provide the endpoint ID:\n",
+        "ENDPOINT_ID = \"\"  # @param {type:\"string\"}\n",
+        "\n",
+        "if deployment_option == \"deploy_new\":\n",
+        "    print(f\"Deploying {MODEL}...\")\n",
+        "    print(f\"Using {ACCELERATOR_TYPE} on {MACHINE_TYPE}\")\n",
+        "    endpoint = model.deploy(\n",
+        "        machine_type=MACHINE_TYPE,\n",
+        "        accelerator_type=ACCELERATOR_TYPE,\n",
+        "        accelerator_count=1,\n",
+        "        accept_eula=True,\n",
+        "        use_dedicated_endpoint=True,\n",
+        "    )\n",
+        "    print(f\"Endpoint deployed: {endpoint.display_name}\")\n",
+        "    print(f\"Endpoint resource name: {endpoint.resource_name}\")\n",
+        "else:\n",
+        "    if not ENDPOINT_ID:\n",
+        "        raise ValueError(\"Please provide an ENDPOINT_ID when using existing endpoint\")\n",
+        "\n",
+        "    from google.cloud import aiplatform\n",
+        "\n",
+        "    print(f\"Connecting to existing endpoint: {ENDPOINT_ID}\")\n",
+        "    endpoint = aiplatform.Endpoint(\n",
+        "        endpoint_name=f\"projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/{ENDPOINT_ID}\"\n",
+        "    )\n",
+        "    print(f\"Using endpoint: {endpoint.display_name}\")\n",
+        "    print(f\"Endpoint resource name: {endpoint.resource_name}\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-20",
+      "metadata": {
+        "id": "0233e16ca856"
+      },
+      "source": [
+        "## Generate embeddings\n",
+        "\n",
+        "Now let's look at basic embedding generation with the Voyage 4 models."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "cell-21",
+      "metadata": {
+        "id": "b7d228ce5961"
+      },
+      "outputs": [],
+      "source": [
+        "import json\n",
+        "\n",
+        "# Multiple texts to embed\n",
+        "texts = [\n",
+        "    \"Machine learning enables computers to learn from data.\",\n",
+        "    \"Natural language processing helps computers understand human language.\",\n",
+        "    \"Computer vision allows machines to interpret visual information.\",\n",
+        "    \"Deep learning uses neural networks with multiple layers.\",\n",
+        "]\n",
+        "\n",
+        "# Prepare the batch request and make invoke call\n",
+        "body = {\"input\": texts, \"output_dimension\": 1024, \"input_type\": \"document\"}\n",
+        "response = endpoint.invoke(\n",
+        "    request_path=\"/embeddings\",\n",
+        "    body=json.dumps(body).encode(\"utf-8\"),\n",
+        "    headers={\"Content-Type\": \"application/json\"},\n",
+        ")\n",
+        "\n",
+        "# Extract embeddings\n",
+        "result = response.json()\n",
+        "embeddings = [item[\"embedding\"] for item in result[\"data\"]]\n",
+        "\n",
+        "print(f\"Number of texts embedded: {len(embeddings)}\")\n",
+        "print(f\"Embedding dimension: {len(embeddings[0])}\")\n",
+        "print(f\"\\nFirst embedding (first 5 values): {embeddings[0][:5]}\")\n",
+        "print(f\"Second embedding (first 5 values): {embeddings[1][:5]}\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-22",
+      "metadata": {
+        "id": "a47938955d96"
+      },
+      "source": [
+        "### Semantic similarity\n",
+        "\n",
+        "Use embeddings to compute semantic similarity between text:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "cell-23",
+      "metadata": {
+        "id": "d65defacbb93"
+      },
+      "outputs": [],
+      "source": [
+        "import json\n",
+        "\n",
+        "import numpy as np\n",
+        "\n",
+        "\n",
+        "def cosine_similarity(vec1, vec2):\n",
+        "    \"\"\"Calculate cosine similarity between two vectors.\"\"\"\n",
+        "    vec1 = np.array(vec1)\n",
+        "    vec2 = np.array(vec2)\n",
+        "    return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))\n",
+        "\n",
+        "\n",
+        "# Example texts\n",
+        "query = \"How do computers learn from examples?\"\n",
+        "documents = [\n",
+        "    \"Machine learning enables computers to learn from data.\",\n",
+        "    \"The weather today is sunny and warm.\",\n",
+        "    \"Neural networks can recognize patterns in data.\",\n",
+        "    \"I enjoy cooking Italian food.\",\n",
+        "]\n",
+        "\n",
+        "# Get embeddings - using invoke with /embeddings endpoint\n",
+        "all_texts = [query] + documents\n",
+        "body = {\"input\": all_texts, \"output_dimension\": 1024, \"input_type\": \"document\"}\n",
+        "response = endpoint.invoke(\n",
+        "    request_path=\"/embeddings\",\n",
+        "    body=json.dumps(body).encode(\"utf-8\"),\n",
+        "    headers={\"Content-Type\": \"application/json\"},\n",
+        ")\n",
+        "result = response.json()\n",
+        "all_embeddings = [item[\"embedding\"] for item in result[\"data\"]]\n",
+        "\n",
+        "query_embedding = all_embeddings[0]\n",
+        "doc_embeddings = all_embeddings[1:]\n",
+        "\n",
+        "# Calculate similarities\n",
+        "print(f\"Query: {query}\\n\")\n",
+        "print(\"Similarity scores:\")\n",
+        "for i, doc in enumerate(documents):\n",
+        "    similarity = cosine_similarity(query_embedding, doc_embeddings[i])\n",
+        "    print(f\"{similarity:.4f} - {doc}\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-26",
+      "metadata": {
+        "id": "c3b964fe9387"
+      },
+      "source": [
+        "## Advanced parameters\n",
+        "\n",
+        "Let's explore the advanced parameters that Voyage 4 models support to optimize your embeddings."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-27",
+      "metadata": {
+        "id": "865ef46609eb"
+      },
+      "source": [
+        "### Understanding input_type: Query vs Document\n",
+        "\n",
+        "The `input_type` parameter optimizes embeddings for retrieval tasks:\n",
+        "\n",
+        "* **`query`**: Use this when the text represents a search query or question. The model prepends \"Represent the query for retrieving supporting documents: \" to optimize for retrieval.\n",
+        "* **`document`**: Use this when the text represents a document or passage to be searched. The model prepends \"Represent the document for retrieval: \" to optimize for indexing.\n",
+        "* **`null`** (default): No special prompt is added. Use for general-purpose embeddings.\n",
+        "\n",
+        "**Best Practice**: For retrieval/search applications, use `input_type=\"query\"` for your search queries and `input_type=\"document\"` for the documents you're indexing."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "cell-28",
+      "metadata": {
+        "id": "a12cf1ce0be7"
+      },
+      "outputs": [],
+      "source": [
+        "import json\n",
+        "\n",
+        "# Example: Using input_type for retrieval\n",
+        "query_text = \"What is machine learning?\"\n",
+        "document_texts = [\n",
+        "    \"Machine learning enables computers to learn from data.\",\n",
+        "    \"Natural language processing helps computers understand human language.\",\n",
+        "    \"Computer vision allows machines to interpret visual information.\",\n",
+        "]\n",
+        "\n",
+        "# Generate query embedding with input_type=\"query\"\n",
+        "query_body = {\n",
+        "    \"input\": [query_text],\n",
+        "    \"output_dimension\": 1024,\n",
+        "    \"input_type\": \"query\",  # Optimize for search queries\n",
+        "}\n",
+        "query_response = endpoint.invoke(\n",
+        "    request_path=\"/embeddings\",\n",
+        "    body=json.dumps(query_body).encode(\"utf-8\"),\n",
+        "    headers={\"Content-Type\": \"application/json\"},\n",
+        ")\n",
+        "query_result = query_response.json()\n",
+        "query_embedding = query_result[\"data\"][0][\"embedding\"]\n",
+        "\n",
+        "# Generate document embeddings with input_type=\"document\"\n",
+        "doc_body = {\n",
+        "    \"input\": document_texts,\n",
+        "    \"output_dimension\": 1024,\n",
+        "    \"input_type\": \"document\",  # Optimize for document indexing\n",
+        "}\n",
+        "doc_response = endpoint.invoke(\n",
+        "    request_path=\"/embeddings\",\n",
+        "    body=json.dumps(doc_body).encode(\"utf-8\"),\n",
+        "    headers={\"Content-Type\": \"application/json\"},\n",
+        ")\n",
+        "doc_result = doc_response.json()\n",
+        "doc_embeddings = [item[\"embedding\"] for item in doc_result[\"data\"]]\n",
+        "\n",
+        "print(f\"Query: {query_text}\")\n",
+        "print(f\"Query embedding dimension: {len(query_embedding)}\")\n",
+        "print(f\"\\nNumber of documents embedded: {len(doc_embeddings)}\")\n",
+        "print(f\"Document embedding dimension: {len(doc_embeddings[0])}\")\n",
+        "print(f\"\\nQuery embedding (first 5 values): {query_embedding[:5]}\")\n",
+        "print(f\"First document embedding (first 5 values): {doc_embeddings[0][:5]}\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-29",
+      "metadata": {
+        "id": "055a68775299"
+      },
+      "source": [
+        "### Using different output dimensions (Matryoshka Representation Learning)\n",
+        "\n",
+        "Voyage 4 models support **Matryoshka Representation Learning (MRL)**, providing variable-dimension embeddings: 256, 512, 1024 (default), and 2048. Smaller dimensions reduce storage and computation costs, while larger dimensions may provide better accuracy."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "cell-30",
+      "metadata": {
+        "id": "8e5ecdbb4d5c"
+      },
+      "outputs": [],
+      "source": [
+        "import json\n",
+        "\n",
+        "text = \"Machine learning enables computers to learn from data.\"\n",
+        "\n",
+        "# Test different output dimensions\n",
+        "dimensions = [256, 512, 1024, 2048]\n",
+        "\n",
+        "print(\"Comparing different output dimensions (MRL):\\n\")\n",
+        "for dim in dimensions:\n",
+        "    body = {\"input\": [text], \"output_dimension\": dim, \"input_type\": \"document\"}\n",
+        "    response = endpoint.invoke(\n",
+        "        request_path=\"/embeddings\",\n",
+        "        body=json.dumps(body).encode(\"utf-8\"),\n",
+        "        headers={\"Content-Type\": \"application/json\"},\n",
+        "    )\n",
+        "    result = response.json()\n",
+        "    embedding = result[\"data\"][0][\"embedding\"]\n",
+        "\n",
+        "    print(f\"Dimension {dim}:\")\n",
+        "    print(f\"  Length: {len(embedding)}\")\n",
+        "    print(f\"  First 5 values: {embedding[:5]}\")\n",
+        "    print(f\"  Storage size: ~{len(embedding) * 4} bytes (float32)\\n\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-31",
+      "metadata": {
+        "id": "2ea3386e2fbb"
+      },
+      "source": [
+        "### Using different output data types (Quantization-Aware Training)\n",
+        "\n",
+        "Voyage 4 models support **Quantization-Aware Training (QAT)**, optimizing embeddings for multiple output data types:\n",
+        "\n",
+        "* **`float`** (default): 32-bit floating-point numbers, highest precision\n",
+        "* **`int8`**: 8-bit signed integers (-128 to 127), 4x smaller than float\n",
+        "* **`uint8`**: 8-bit unsigned integers (0 to 255), 4x smaller than float\n",
+        "* **`binary`**: Bit-packed signed integers (int8), 32x smaller than float\n",
+        "* **`ubinary`**: Bit-packed unsigned integers (uint8), 32x smaller than float\n",
+        "\n",
+        "Quantized formats trade some precision for significant storage savings, with minimal quality loss thanks to QAT."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "cell-32",
+      "metadata": {
+        "id": "3f7f5abccd85"
+      },
+      "outputs": [],
+      "source": [
+        "import json\n",
+        "\n",
+        "text = \"Machine learning enables computers to learn from data.\"\n",
+        "\n",
+        "# Test different output data types\n",
+        "output_dtypes = [\"float\", \"int8\", \"uint8\", \"binary\", \"ubinary\"]\n",
+        "output_dimension = 1024\n",
+        "\n",
+        "print(\"Comparing different output data types (QAT):\\n\")\n",
+        "for dtype in output_dtypes:\n",
+        "    body = {\n",
+        "        \"input\": [text],\n",
+        "        \"output_dimension\": output_dimension,\n",
+        "        \"output_dtype\": dtype,\n",
+        "        \"input_type\": \"document\",\n",
+        "    }\n",
+        "    response = endpoint.invoke(\n",
+        "        request_path=\"/embeddings\",\n",
+        "        body=json.dumps(body).encode(\"utf-8\"),\n",
+        "        headers={\"Content-Type\": \"application/json\"},\n",
+        "    )\n",
+        "    result = response.json()\n",
+        "    embedding = result[\"data\"][0][\"embedding\"]\n",
+        "\n",
+        "    # Calculate actual storage size\n",
+        "    if dtype == \"float\":\n",
+        "        storage_bytes = len(embedding) * 4  # 4 bytes per float32\n",
+        "    elif dtype in [\"int8\", \"uint8\"]:\n",
+        "        storage_bytes = len(embedding) * 1  # 1 byte per int8/uint8\n",
+        "    elif dtype in [\"binary\", \"ubinary\"]:\n",
+        "        storage_bytes = len(embedding) * 1  # bit-packed, 1/8 of dimension\n",
+        "\n",
+        "    print(f\"Output dtype: {dtype}\")\n",
+        "    print(f\"  Length: {len(embedding)}\")\n",
+        "    print(f\"  Value type: {type(embedding[0]).__name__}\")\n",
+        "    print(f\"  First 5 values: {embedding[:5]}\")\n",
+        "    print(f\"  Storage size: ~{storage_bytes} bytes\")\n",
+        "\n",
+        "    # Calculate compression ratio vs float\n",
+        "    if dtype != \"float\":\n",
+        "        compression_ratio = (output_dimension * 4) / storage_bytes\n",
+        "        print(f\"  Compression: {compression_ratio:.1f}x smaller than float\")\n",
+        "    print()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-33",
+      "metadata": {
+        "id": "130d571dd4a8"
+      },
+      "source": [
+        "### Combining output_dimension and output_dtype\n",
+        "\n",
+        "You can combine different dimensions and data types to optimize for your use case.\n",
+        "\n",
+        "Please refer to our guide for details on [offset binary](https://docs.voyageai.com/docs/flexible-dimensions-and-quantization#offset-binary) and [binary embeddings](https://docs.voyageai.com/docs/flexible-dimensions-and-quantization#quantization). "
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "cell-34",
+      "metadata": {
+        "id": "2738b2576859"
+      },
+      "outputs": [],
+      "source": [
+        "import json\n",
+        "\n",
+        "text = \"Machine learning enables computers to learn from data.\"\n",
+        "\n",
+        "# Example: Ultra-compact embeddings (256 dimensions + ubinary)\n",
+        "compact_body = {\n",
+        "    \"input\": [text],\n",
+        "    \"output_dimension\": 256,\n",
+        "    \"output_dtype\": \"ubinary\",  # Most compact format\n",
+        "    \"input_type\": \"document\",\n",
+        "}\n",
+        "compact_response = endpoint.invoke(\n",
+        "    request_path=\"/embeddings\",\n",
+        "    body=json.dumps(compact_body).encode(\"utf-8\"),\n",
+        "    headers={\"Content-Type\": \"application/json\"},\n",
+        ")\n",
+        "compact_result = compact_response.json()\n",
+        "compact_embedding = compact_result[\"data\"][0][\"embedding\"]\n",
+        "\n",
+        "# Example: High-precision embeddings (2048 dimensions + float)\n",
+        "precise_body = {\n",
+        "    \"input\": [text],\n",
+        "    \"output_dimension\": 2048,\n",
+        "    \"output_dtype\": \"float\",  # Highest precision\n",
+        "    \"input_type\": \"document\",\n",
+        "}\n",
+        "precise_response = endpoint.invoke(\n",
+        "    request_path=\"/embeddings\",\n",
+        "    body=json.dumps(precise_body).encode(\"utf-8\"),\n",
+        "    headers={\"Content-Type\": \"application/json\"},\n",
+        ")\n",
+        "precise_result = precise_response.json()\n",
+        "precise_embedding = precise_result[\"data\"][0][\"embedding\"]\n",
+        "\n",
+        "# Compare storage requirements\n",
+        "compact_storage = len(compact_embedding) * 1  # binary is bit-packed\n",
+        "precise_storage = len(precise_embedding) * 4  # float32\n",
+        "\n",
+        "print(\"Storage comparison:\\n\")\n",
+        "print(\"Ultra-compact (256-dim ubinary):\")\n",
+        "print(\"  Dimension: 256\")\n",
+        "print(f\"  Storage: ~{compact_storage} bytes\")\n",
+        "print(f\"  First 5 values: {compact_embedding[:5]}\\n\")\n",
+        "\n",
+        "print(\"High-precision (2048-dim float):\")\n",
+        "print(f\"  Dimension: {len(precise_embedding)}\")\n",
+        "print(f\"  Storage: ~{precise_storage} bytes\")\n",
+        "print(f\"  First 5 values: {precise_embedding[:5]}\\n\")\n",
+        "\n",
+        "print(f\"Storage ratio: {precise_storage / compact_storage:.1f}x\")\n",
+        "print(\"\\nFor 1 million vectors:\")\n",
+        "print(f\"  Ultra-compact: ~{compact_storage * 1_000_000 / (1024**2):.1f} MB\")\n",
+        "print(f\"  High-precision: ~{precise_storage * 1_000_000 / (1024**2):.1f} MB\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "id": "cell-35",
+      "metadata": {
+        "id": "3067b6759d2b"
+      },
+      "source": [
+        "## Cleaning up\n",
+        "\n",
+        "To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, delete the endpoint."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "id": "cell-36",
+      "metadata": {
+        "id": "f59817e807ee"
+      },
+      "outputs": [],
+      "source": [
+        "# Delete the endpoint (this will also undeploy all models)\n",
+        "print(f\"Deleting endpoint: {endpoint.display_name}\")\n",
+        "endpoint.delete(force=True)\n",
+        "print(\"Endpoint deleted successfully!\")"
+      ]
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "name": "voyage-4.ipynb",
+      "toc_visible": true
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}