diff --git a/notebooks/official/generative_ai/voyage-multimodal-3.5.ipynb b/notebooks/official/generative_ai/voyage-multimodal-3.5.ipynb
new file mode 100644
index 000000000..925bfb2f4
--- /dev/null
+++ b/notebooks/official/generative_ai/voyage-multimodal-3.5.ipynb
@@ -0,0 +1,1324 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "6b3283cdfd08"
+ },
+ "outputs": [],
+ "source": [
+ "# Copyright 2026 MongoDB, Inc\n",
+ "#\n",
+ "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "# you may not use this file except in compliance with the License.\n",
+ "# You may obtain a copy of the License at\n",
+ "#\n",
+ "# https://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing, software\n",
+ "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "# See the License for the specific language governing permissions and\n",
+ "# limitations under the License."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "36b86ddc6744"
+ },
+ "source": [
+ "# Voyage Multimodal 3.5\n",
+ "\n",
+ "This notebook demonstrates how to deploy and use the Voyage Multimodal 3.5 embedding model.\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ "  Open in Colab\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in Colab Enterprise\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  Open in Workbench\n",
+ " \n",
+ " | \n",
+ " \n",
+ " \n",
+ "  View on GitHub\n",
+ " \n",
+ " | \n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "d3d3e7fcda4f"
+ },
+ "source": [
+ "## Overview\n",
+ "\n",
+ "**Voyage Multimodal 3.5** is a state-of-the-art multimodal embedding model designed for cross-modal semantic search, retrieval-augmented generation (RAG), and intelligent AI applications. This model provides:\n",
+ "\n",
+ "* **Multimodal Understanding**: Vectorize text, images, and video individually or interleaved together\n",
+ "* **Cross-Modal Search**: Excellent performance for mixed-modality searches involving text and visual content\n",
+ "* **Flexible Dimensions**: Support for 256, 512, 1024, and 2048 dimensions via Matryoshka learning\n",
+ "* **Quantization Options**: Multiple quantization formats for optimal storage and performance\n",
+ "* **Maximum 32K tokens input**: Support for long documents and multiple media items\n",
+ "\n",
+ "### What you'll learn\n",
+ "\n",
+ "In this notebook, you will:\n",
+ "\n",
+ "* Deploy the Voyage Multimodal 3.5 model to a Vertex AI endpoint\n",
+ "* Generate embeddings for text, images, and video\n",
+ "* Create multimodal embeddings combining text and images\n",
+ "* Use embeddings for cross-modal semantic similarity\n",
+ "* Clean up resources after use\n",
+ "\n",
+ "### Costs\n",
+ "\n",
+ "This tutorial uses billable components of Google Cloud:\n",
+ "\n",
+ "* Vertex AI Model Garden\n",
+ "* Vertex AI Prediction endpoints\n",
+ "\n",
+ "Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "4d998a5140b2"
+ },
+ "source": [
+ "## Get started"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "b92cb16aea9c"
+ },
+ "source": [
+ "### Install Vertex AI SDK for Python and other required packages\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "030faea19be1"
+ },
+ "outputs": [],
+ "source": [
+ "! pip3 install --upgrade --quiet google-cloud-aiplatform numpy"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "848322ec177e"
+ },
+ "source": [
+ "### Restart runtime (Colab only)\n",
+ "\n",
+ "To use the newly installed packages, you must restart the runtime on Google Colab."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "b8d49bb74a53"
+ },
+ "outputs": [],
+ "source": [
+ "import sys\n",
+ "\n",
+ "if \"google.colab\" in sys.modules:\n",
+ "\n",
+ " import IPython\n",
+ "\n",
+ " app = IPython.Application.instance()\n",
+ " app.kernel.do_shutdown(True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "780490bfb862"
+ },
+ "source": [
+ "\n",
+ "⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️\n",
+ "
\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "1117fcd212f8"
+ },
+ "source": [
+ "### Authenticate your notebook environment (Colab only)\n",
+ "\n",
+ "Authenticate your environment on Google Colab.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "015bf6d5da75"
+ },
+ "outputs": [],
+ "source": [
+ "import sys\n",
+ "\n",
+ "if \"google.colab\" in sys.modules:\n",
+ "\n",
+ " from google.colab import auth\n",
+ "\n",
+ " auth.authenticate_user()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "722a10c66085"
+ },
+ "source": [
+ "### Set Google Cloud project information and initialize Vertex AI SDK for Python\n",
+ "\n",
+ "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ead0f677c004"
+ },
+ "outputs": [],
+ "source": [
+ "# @title Setup Google Cloud project\n",
+ "\n",
+ "# Set your Google Cloud project ID and region below:\n",
+ "\n",
+ "import os\n",
+ "\n",
+ "import vertexai\n",
+ "\n",
+ "# @markdown Enter your project ID if not auto-detected:\n",
+ "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}\n",
+ "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n",
+ " PROJECT_ID = os.environ.get(\"GOOGLE_CLOUD_PROJECT\")\n",
+ "\n",
+ "# @markdown Select your region:\n",
+ "LOCATION = \"us-central1\" # @param [\"us-central1\", \"us-east1\", \"us-west1\", \"europe-west1\", \"europe-west4\", \"asia-east1\", \"asia-southeast1\"]\n",
+ "\n",
+ "print(f\"Project ID: {PROJECT_ID}\")\n",
+ "print(f\"Location: {LOCATION}\")\n",
+ "\n",
+ "vertexai.init(project=PROJECT_ID, location=LOCATION)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "23acca3ed72b"
+ },
+ "source": [
+ "## Deploy model"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "790c7cc43b1c"
+ },
+ "source": [
+ "### Initialize the Model\n",
+ "\n",
+ "Initialize the Voyage Multimodal 3.5 model from Model Garden.\n",
+ "\n",
+ "Use the `list_deploy_options()` method to view the verified deployment configurations for your selected model. This helps ensure you have sufficient resources (e.g., GPU quota) available to deploy it."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "56d52ebdc7c3"
+ },
+ "outputs": [],
+ "source": [
+ "from vertexai import model_garden\n",
+ "\n",
+ "MODEL_NAME = \"mongodb/voyage-multimodal-3.5@latest\"\n",
+ "model = model_garden.OpenModel(MODEL_NAME)\n",
+ "\n",
+ "deploy_options = model.list_deploy_options(concise=True)\n",
+ "print(deploy_options)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "c2f4541829d7"
+ },
+ "source": [
+ "### Deploy the Model\n",
+ "\n",
+ "Now that you've reviewed the deployment options, use the `deploy()` method to serve the Voyage Multimodal 3.5 model to a Vertex AI endpoint. Deployment time may vary depending on infrastructure requirements.\n",
+ "\n",
+ "You can either deploy a new model or use an existing endpoint. Set `use_dedicated_endpoint` to `True` as voyage-multimodal-3.5 requires a [dedicated endpoint](https://cloud.google.com/vertex-ai/docs/general/deployment#create-dedicated-endpoint)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "c797fed2dd9a"
+ },
+ "outputs": [],
+ "source": [
+ "# @title Deploy or connect to endpoint\n",
+ "# @markdown Choose whether to deploy a new model or use an existing endpoint:\n",
+ "\n",
+ "deployment_option = \"deploy_new\" # @param [\"deploy_new\", \"use_existing\"]\n",
+ "\n",
+ "# @markdown ---\n",
+ "# @markdown If using existing endpoint, provide the endpoint ID:\n",
+ "ENDPOINT_ID = \"\" # @param {type:\"string\"}\n",
+ "\n",
+ "if deployment_option == \"deploy_new\":\n",
+ " print(\"Deploying new model...\")\n",
+ " endpoint = model.deploy(\n",
+ " accept_eula=True,\n",
+ " use_dedicated_endpoint=True,\n",
+ " )\n",
+ " print(f\"Endpoint deployed: {endpoint.display_name}\")\n",
+ " print(f\"Endpoint resource name: {endpoint.resource_name}\")\n",
+ "else:\n",
+ " if not ENDPOINT_ID:\n",
+ " raise ValueError(\"Please provide an ENDPOINT_ID when using existing endpoint\")\n",
+ "\n",
+ " from google.cloud import aiplatform\n",
+ "\n",
+ " print(f\"Connecting to existing endpoint: {ENDPOINT_ID}\")\n",
+ " endpoint = aiplatform.Endpoint(\n",
+ " endpoint_name=f\"projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/{ENDPOINT_ID}\"\n",
+ " )\n",
+ " print(f\"Using endpoint: {endpoint.display_name}\")\n",
+ " print(f\"Endpoint resource name: {endpoint.resource_name}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "5c2a64096489"
+ },
+ "source": [
+ "### Advanced Deployment Configuration (Optional)\n",
+ "\n",
+ "To further customize your deployment, you can configure:\n",
+ "\n",
+ "- **Compute Resources**: Machine type, replica count (min/max), accelerator type and quantity.\n",
+ "- **Infrastructure**: Use Spot VMs, reservation affinity, or dedicated endpoints.\n",
+ "- **Serving Container**: Customize container image, ports, health checks, and environment variables.\n",
+ "\n",
+ "See the [Model Garden SDK README](https://github.com/googleapis/python-aiplatform/blob/main/vertexai/model_garden/README.md) for advanced configuration options."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "f49e74ef5207"
+ },
+ "source": [
+ "## Generate embeddings with Voyage Multimodal 3.5\n",
+ "\n",
+ "Now that the model is deployed, you can generate embeddings for text, images, video, or any combination of these modalities.\n",
+ "\n",
+ "The multimodal API uses a different input format than text-only models. Each input is an object with a `content` array containing typed elements:\n",
+ "\n",
+ "- **Text**: `{\"type\": \"text\", \"text\": \"your text here\"}`\n",
+ "- **Image URL**: `{\"type\": \"image_url\", \"image_url\": \"https://...\"}`\n",
+ "- **Image Base64**: `{\"type\": \"image_base64\", \"image_base64\": \"data:image/jpeg;base64,...\"}`\n",
+ "- **Video URL**: `{\"type\": \"video_url\", \"video_url\": \"https://...\"}`\n",
+ "- **Video Base64**: `{\"type\": \"video_base64\", \"video_base64\": \"data:video/mp4;base64,...\"}`"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "3debc925a049"
+ },
+ "source": [
+ "### Text embeddings\n",
+ "\n",
+ "Generate embeddings for text inputs:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "3fb5010b814e"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "# Text inputs to embed\n",
+ "texts = [\n",
+ " \"A photo of a golden retriever playing in the park.\",\n",
+ " \"Machine learning enables computers to learn from data.\",\n",
+ " \"A beautiful sunset over the ocean with orange and purple skies.\",\n",
+ " \"The quarterly financial report shows strong revenue growth.\",\n",
+ "]\n",
+ "\n",
+ "# Format inputs for multimodal API\n",
+ "inputs = [{\"content\": [{\"type\": \"text\", \"text\": t}]} for t in texts]\n",
+ "\n",
+ "# Prepare the request\n",
+ "body = {\"model\": \"voyage-multimodal-3.5\", \"inputs\": inputs, \"input_type\": \"document\"}\n",
+ "\n",
+ "response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "\n",
+ "# Extract embeddings\n",
+ "result = response.json()\n",
+ "embeddings = [item[\"embedding\"] for item in result[\"data\"]]\n",
+ "\n",
+ "print(f\"Number of texts embedded: {len(embeddings)}\")\n",
+ "print(f\"Embedding dimension: {len(embeddings[0])}\")\n",
+ "print(f\"\\nFirst embedding (first 5 values): {embeddings[0][:5]}\")\n",
+ "print(f\"\\nUsage: {result.get('usage', {})}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "2444542a881d"
+ },
+ "source": [
+ "### Image embeddings\n",
+ "\n",
+ "Generate embeddings for images. You can provide images via URL or base64-encoded data.\n",
+ "\n",
+ "#### Using image URLs"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "f76afde2df11"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "# Example image from Voyage AI's documentation\n",
+ "image_url = \"https://raw.githubusercontent.com/voyage-ai/voyage-multimodal-3/refs/heads/main/images/banana.jpg\"\n",
+ "\n",
+ "# Format input with image URL\n",
+ "inputs = [{\"content\": [{\"type\": \"image_url\", \"image_url\": image_url}]}]\n",
+ "\n",
+ "body = {\"model\": \"voyage-multimodal-3.5\", \"inputs\": inputs, \"input_type\": \"document\"}\n",
+ "\n",
+ "response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "\n",
+ "result = response.json()\n",
+ "image_embedding = result[\"data\"][0][\"embedding\"]\n",
+ "\n",
+ "print(f\"Embedding dimension: {len(image_embedding)}\")\n",
+ "print(f\"Embedding (first 5 values): {image_embedding[:5]}\")\n",
+ "print(f\"\\nUsage: {result.get('usage', {})}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "c4c375633532"
+ },
+ "source": [
+ "#### Using base64-encoded images\n",
+ "\n",
+ "For local images, use Google Colab's file upload interface:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "2045aa376a85"
+ },
+ "outputs": [],
+ "source": [
+ "import base64\n",
+ "import json\n",
+ "import sys\n",
+ "\n",
+ "\n",
+ "def encode_image_base64(image_bytes: bytes, filename: str) -> str:\n",
+ " \"\"\"Encode image bytes as a base64 data URI.\"\"\"\n",
+ " # Determine MIME type from extension\n",
+ " extension = filename.lower().split(\".\")[-1]\n",
+ " mime_types = {\n",
+ " \"jpg\": \"image/jpeg\",\n",
+ " \"jpeg\": \"image/jpeg\",\n",
+ " \"png\": \"image/png\",\n",
+ " \"gif\": \"image/gif\",\n",
+ " \"webp\": \"image/webp\",\n",
+ " }\n",
+ " mime_type = mime_types.get(extension, \"image/jpeg\")\n",
+ "\n",
+ " b64_str = base64.b64encode(image_bytes).decode(\"ascii\")\n",
+ " return f\"data:{mime_type};base64,{b64_str}\"\n",
+ "\n",
+ "\n",
+ "# Upload image file (Colab only)\n",
+ "if \"google.colab\" in sys.modules:\n",
+ " from google.colab import files\n",
+ "\n",
+ " print(\"Please upload an image file (JPG, PNG, etc.):\")\n",
+ " uploaded = files.upload()\n",
+ "\n",
+ " if uploaded:\n",
+ " # Get the first uploaded file\n",
+ " filename = list(uploaded.keys())[0]\n",
+ " image_bytes = uploaded[filename]\n",
+ "\n",
+ " # Encode and generate embedding\n",
+ " image_base64 = encode_image_base64(image_bytes, filename)\n",
+ "\n",
+ " body = {\n",
+ " \"model\": \"voyage-multimodal-3.5\",\n",
+ " \"inputs\": [\n",
+ " {\"content\": [{\"type\": \"image_base64\", \"image_base64\": image_base64}]}\n",
+ " ],\n",
+ " \"input_type\": \"document\",\n",
+ " }\n",
+ "\n",
+ " response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ " )\n",
+ "\n",
+ " result = response.json()\n",
+ " embedding = result[\"data\"][0][\"embedding\"]\n",
+ "\n",
+ " print(f\"\\nEmbedding dimension: {len(embedding)}\")\n",
+ " print(f\"Embedding (first 5 values): {embedding[:5]}\")\n",
+ " print(f\"\\nUsage: {result.get('usage', {})}\")\n",
+ "else:\n",
+ " print(\"File upload is only available in Google Colab.\")\n",
+ " print(\n",
+ " \"For other environments, use the encode_image_base64() helper function with file bytes.\"\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "62f2249de9dd"
+ },
+ "source": [
+ "### Video embeddings\n",
+ "\n",
+ "Generate embeddings for video content. Videos must be:\n",
+ "- **Format**: MP4 container\n",
+ "- **Size**: Maximum 20 MB\n",
+ "- **Frames**: At least 2 frames\n",
+ "\n",
+ "#### Using video URLs"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "0aca2573178b"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "# Example video URL (Cooking video, ~500kb)\n",
+ "video_url = \"https://file.garden/aTiKu4GB_i5vfop6/example_video_01.mp4\"\n",
+ "\n",
+ "# Format input with video URL\n",
+ "inputs = [{\"content\": [{\"type\": \"video_url\", \"video_url\": video_url}]}]\n",
+ "\n",
+ "body = {\"model\": \"voyage-multimodal-3.5\", \"inputs\": inputs, \"input_type\": \"document\"}\n",
+ "\n",
+ "response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "\n",
+ "result = response.json()\n",
+ "video_embedding = result[\"data\"][0][\"embedding\"]\n",
+ "usage = result.get(\"usage\", {})\n",
+ "\n",
+ "print(f\"Embedding dimension: {len(video_embedding)}\")\n",
+ "print(f\"Embedding (first 5 values): {video_embedding[:5]}\")\n",
+ "print(\"\\nUsage:\")\n",
+ "print(f\" Total tokens: {usage.get('total_tokens')}\")\n",
+ "print(f\" Video pixels: {usage.get('video_pixels')}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "bc129e861b42"
+ },
+ "source": [
+ "#### Using base64-encoded videos\n",
+ "\n",
+ "For local videos, use Google Colab's file upload interface:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "c9fa7e05270e"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "import sys\n",
+ "\n",
+ "\n",
+ "def encode_video_base64(video_bytes: bytes) -> str:\n",
+ " \"\"\"Encode video bytes as a base64 data URI.\"\"\"\n",
+ " b64_str = base64.b64encode(video_bytes).decode(\"ascii\")\n",
+ " return f\"data:video/mp4;base64,{b64_str}\"\n",
+ "\n",
+ "\n",
+ "# Upload video file (Colab only)\n",
+ "if \"google.colab\" in sys.modules:\n",
+ " from google.colab import files\n",
+ "\n",
+ " print(\"Please upload an MP4 video file (max 20 MB):\")\n",
+ " uploaded = files.upload()\n",
+ "\n",
+ " if uploaded:\n",
+ " # Get the first uploaded file\n",
+ " filename = list(uploaded.keys())[0]\n",
+ " video_bytes = uploaded[filename]\n",
+ "\n",
+ " file_size_mb = len(video_bytes) / (1024 * 1024)\n",
+ " print(f\"\\nUploaded: {filename} ({file_size_mb:.2f} MB)\")\n",
+ "\n",
+ " if file_size_mb > 20:\n",
+ " print(\"Warning: File exceeds 20 MB limit and may be rejected by the API.\")\n",
+ "\n",
+ " # Encode and generate embedding\n",
+ " video_base64 = encode_video_base64(video_bytes)\n",
+ "\n",
+ " body = {\n",
+ " \"model\": \"voyage-multimodal-3.5\",\n",
+ " \"inputs\": [\n",
+ " {\"content\": [{\"type\": \"video_base64\", \"video_base64\": video_base64}]}\n",
+ " ],\n",
+ " \"input_type\": \"document\",\n",
+ " }\n",
+ "\n",
+ " response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ " )\n",
+ "\n",
+ " result = response.json()\n",
+ " embedding = result[\"data\"][0][\"embedding\"]\n",
+ " usage = result.get(\"usage\", {})\n",
+ "\n",
+ " print(f\"\\nEmbedding dimension: {len(embedding)}\")\n",
+ " print(f\"Embedding (first 5 values): {embedding[:5]}\")\n",
+ " print(f\"Total tokens: {usage.get('total_tokens')}\")\n",
+ " print(f\"Video pixels: {usage.get('video_pixels')}\")\n",
+ "else:\n",
+ " print(\"File upload is only available in Google Colab.\")\n",
+ " print(\n",
+ " \"For other environments, use the encode_video_base64() helper function with video bytes.\"\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "43d7d809f46d"
+ },
+ "source": [
+ "### Multimodal embeddings (text + images + video)\n",
+ "\n",
+ "A key feature of Voyage Multimodal 3.5 is the ability to create embeddings from interleaved text, images, and video. This is useful for rich documents that combine multiple modalities."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "062b0a413e05"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "# Create a multimodal input combining text, image, and video\n",
+ "multimodal_input = {\n",
+ " \"content\": [\n",
+ " {\"type\": \"text\", \"text\": \"This is a banana.\"},\n",
+ " {\n",
+ " \"type\": \"image_url\",\n",
+ " \"image_url\": \"https://raw.githubusercontent.com/voyage-ai/voyage-multimodal-3/refs/heads/main/images/banana.jpg\",\n",
+ " },\n",
+ " {\n",
+ " \"type\": \"video_url\",\n",
+ " \"video_url\": \"https://file.garden/aTiKu4GB_i5vfop6/example_video_01.mp4\",\n",
+ " },\n",
+ " ]\n",
+ "}\n",
+ "\n",
+ "body = {\n",
+ " \"model\": \"voyage-multimodal-3.5\",\n",
+ " \"inputs\": [multimodal_input],\n",
+ " \"input_type\": \"document\",\n",
+ "}\n",
+ "\n",
+ "response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "\n",
+ "result = response.json()\n",
+ "multimodal_embedding = result[\"data\"][0][\"embedding\"]\n",
+ "usage = result.get(\"usage\", {})\n",
+ "\n",
+ "print(f\"Multimodal embedding dimension: {len(multimodal_embedding)}\")\n",
+ "print(f\"Embedding (first 5 values): {multimodal_embedding[:5]}\")\n",
+ "print(\"\\nUsage:\")\n",
+ "print(f\" Text tokens: {usage.get('text_tokens')}\")\n",
+ "print(f\" Image pixels: {usage.get('image_pixels')}\")\n",
+ "print(f\" Video pixels: {usage.get('video_pixels')}\")\n",
+ "print(f\" Total tokens: {usage.get('total_tokens')}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "b977262dff32"
+ },
+ "source": [
+ "### Cross-modal semantic similarity\n",
+ "\n",
+ "One of the most powerful features of multimodal embeddings is the ability to search across modalities. You can use a text query to find relevant images or videos."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "0c44772f12f0"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "import numpy as np\n",
+ "\n",
+ "\n",
+ "def cosine_similarity(vec1, vec2):\n",
+ " \"\"\"Calculate cosine similarity between two vectors.\"\"\"\n",
+ " vec1 = np.array(vec1)\n",
+ " vec2 = np.array(vec2)\n",
+ " return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))\n",
+ "\n",
+ "\n",
+ "# Text queries\n",
+ "queries = [\"A yellow fruit\", \"A green vegetable\"]\n",
+ "\n",
+ "# Documents to search (image and video)\n",
+ "documents = [\n",
+ " {\n",
+ " \"type\": \"image_url\",\n",
+ " \"image_url\": \"https://raw.githubusercontent.com/voyage-ai/voyage-multimodal-3/refs/heads/main/images/banana.jpg\",\n",
+ " \"description\": \"Banana image\",\n",
+ " },\n",
+ " {\n",
+ " \"type\": \"video_url\",\n",
+ " \"video_url\": \"https://file.garden/aTiKu4GB_i5vfop6/example_video_01.mp4\",\n",
+ " \"description\": \"Cooking video\",\n",
+ " },\n",
+ "]\n",
+ "\n",
+ "# Get document embeddings once (use input_type=\"document\" for documents to be searched)\n",
+ "doc_inputs = []\n",
+ "for doc in documents:\n",
+ " media_type = doc[\"type\"]\n",
+ " media_url = doc[media_type]\n",
+ " content_item = {\"type\": media_type, media_type: media_url}\n",
+ " doc_inputs.append({\"content\": [content_item]})\n",
+ "\n",
+ "doc_body = {\n",
+ " \"model\": \"voyage-multimodal-3.5\",\n",
+ " \"inputs\": doc_inputs,\n",
+ " \"input_type\": \"document\",\n",
+ "}\n",
+ "doc_response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(doc_body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "doc_embeddings = [item[\"embedding\"] for item in doc_response.json()[\"data\"]]\n",
+ "\n",
+ "# Test each query against the documents\n",
+ "for query_text in queries:\n",
+ " # Get query embedding (use input_type=\"query\" for search queries)\n",
+ " query_body = {\n",
+ " \"model\": \"voyage-multimodal-3.5\",\n",
+ " \"inputs\": [{\"content\": [{\"type\": \"text\", \"text\": query_text}]}],\n",
+ " \"input_type\": \"query\",\n",
+ " }\n",
+ " query_response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(query_body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ " )\n",
+ " query_embedding = query_response.json()[\"data\"][0][\"embedding\"]\n",
+ "\n",
+ " # Calculate cross-modal similarities\n",
+ " print(f'Query: \"{query_text}\"')\n",
+ " print(\"Cross-modal similarity scores:\")\n",
+ " for doc, embedding in zip(documents, doc_embeddings):\n",
+ " similarity = cosine_similarity(query_embedding, embedding)\n",
+ " print(f\" {similarity:.4f} - {doc['description']}\")\n",
+ " print()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "02eaf30726e8"
+ },
+ "source": [
+ "## Advanced parameters\n",
+ "\n",
+ "Voyage Multimodal 3.5 supports several parameters to customize embedding generation."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "3de91676bceb"
+ },
+ "source": [
+ "### Understanding input_type: Query vs Document\n",
+ "\n",
+ "The `input_type` parameter optimizes embeddings for retrieval tasks:\n",
+ "\n",
+ "* **`query`**: Use this when the input represents a search query. The model prepends \"Represent the query for retrieving supporting documents: \" to optimize for retrieval.\n",
+ "* **`document`**: Use this when the input represents content to be indexed. The model prepends \"Represent the document for retrieval: \" to optimize for indexing.\n",
+ "* **`null`** (default): No special prompt is added. Use for general-purpose embeddings.\n",
+ "\n",
+ "**Best Practice**: For retrieval applications, use `input_type=\"query\"` for search queries and `input_type=\"document\"` for the content you're indexing. Embeddings generated with and without the `input_type` argument are compatible."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "aff33ce1d594"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "# Example: Different input types for retrieval\n",
+ "search_query = \"What does a banana look like?\"\n",
+ "\n",
+ "# Query embedding (for search)\n",
+ "query_body = {\n",
+ " \"model\": \"voyage-multimodal-3.5\",\n",
+ " \"inputs\": [{\"content\": [{\"type\": \"text\", \"text\": search_query}]}],\n",
+ " \"input_type\": \"query\", # Optimized for search queries\n",
+ "}\n",
+ "query_response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(query_body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "query_result = query_response.json()\n",
+ "\n",
+ "# Document embedding (for indexing)\n",
+ "doc_body = {\n",
+ " \"model\": \"voyage-multimodal-3.5\",\n",
+ " \"inputs\": [{\"content\": [{\"type\": \"text\", \"text\": search_query}]}],\n",
+ " \"input_type\": \"document\", # Optimized for documents\n",
+ "}\n",
+ "doc_response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(doc_body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "doc_result = doc_response.json()\n",
+ "\n",
+ "# General-purpose embedding (no input_type)\n",
+ "general_body = {\n",
+ " \"model\": \"voyage-multimodal-3.5\",\n",
+ " \"inputs\": [{\"content\": [{\"type\": \"text\", \"text\": search_query}]}],\n",
+ " # input_type defaults to null\n",
+ "}\n",
+ "general_response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(general_body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "general_result = general_response.json()\n",
+ "\n",
+ "print(f'Text: \"{search_query}\"\\n')\n",
+ "print(f\"Query embedding (first 5): {query_result['data'][0]['embedding'][:5]}\")\n",
+ "print(f\"Document embedding (first 5): {doc_result['data'][0]['embedding'][:5]}\")\n",
+ "print(f\"General embedding (first 5): {general_result['data'][0]['embedding'][:5]}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "e50706d6244a"
+ },
+ "source": [
+ "### Truncation\n",
+ "\n",
+ "The `truncation` parameter controls how the model handles inputs that exceed the context window (32,000 tokens):\n",
+ "\n",
+ "* **`true`** (default): Automatically truncate inputs that exceed the context limit. If truncation happens in the middle of an image, the entire image will be discarded.\n",
+ "* **`false`**: Return an error if any input exceeds the context limit.\n",
+ "\n",
+ "When truncation occurs, you may see a warning in the response headers."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "d3710b176d36"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "# Example: Create input that exceeds context limit to trigger truncation\n",
+ "# We'll repeat a video URL multiple times to exceed 32k tokens\n",
+ "video_url = \"https://file.garden/aTiKu4GB_i5vfop6/example_video_01.mp4\"\n",
+ "\n",
+ "# Create input with 4 videos (should exceed 32k token limit)\n",
+ "truncation_input = {\n",
+ " \"content\": [\n",
+ " {\"type\": \"video_url\", \"video_url\": video_url},\n",
+ " {\"type\": \"video_url\", \"video_url\": video_url},\n",
+ " {\"type\": \"video_url\", \"video_url\": video_url},\n",
+ " {\"type\": \"video_url\", \"video_url\": video_url},\n",
+ " ]\n",
+ "}\n",
+ "\n",
+ "body = {\n",
+ " \"model\": \"voyage-multimodal-3.5\",\n",
+ " \"inputs\": [truncation_input],\n",
+ " \"input_type\": \"document\",\n",
+ " \"truncation\": True, # Enable automatic truncation (this is the default)\n",
+ "}\n",
+ "\n",
+ "response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "\n",
+ "result = response.json()\n",
+ "usage = result.get(\"usage\", {})\n",
+ "\n",
+ "print(\"Embedding generated with truncation enabled\")\n",
+ "print(f\"Dimension: {len(result['data'][0]['embedding'])}\")\n",
+ "print(\"\\nUsage:\")\n",
+ "print(f\" Total tokens: {usage.get('total_tokens')}\")\n",
+ "print(f\" Video pixels: {usage.get('video_pixels')}\")\n",
+ "\n",
+ "# Check response headers for truncation warning\n",
+ "if hasattr(response, \"headers\"):\n",
+ " warning = response.headers.get(\"x-api-warning\", response.headers.get(\"warning\"))\n",
+ " if warning:\n",
+ " print(f\"\\nTruncation warning: {warning}\")\n",
+ " else:\n",
+ " print(\"\\nNo truncation warning detected (may have fit within limit)\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "991a709ced51"
+ },
+ "source": [
+ "### Output encoding\n",
+ "\n",
+ "The `output_encoding` parameter controls the format of the embedding output:\n",
+ "\n",
+ "* **`null`** (default): Embeddings are returned as a list of floating-point numbers.\n",
+ "* **`base64`**: Embeddings are returned as a Base64-encoded string representing a NumPy array of single-precision floats. This can be more efficient for large batch operations."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "d1a982c1fd55"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "text = \"A beautiful landscape photo.\"\n",
+ "\n",
+ "# Default output (list of floats)\n",
+ "default_body = {\n",
+ " \"model\": \"voyage-multimodal-3.5\",\n",
+ " \"inputs\": [{\"content\": [{\"type\": \"text\", \"text\": text}]}],\n",
+ " \"input_type\": \"document\",\n",
+ "}\n",
+ "default_response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(default_body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "default_embedding = default_response.json()[\"data\"][0][\"embedding\"]\n",
+ "\n",
+ "# Base64-encoded output\n",
+ "base64_body = {\n",
+ " \"model\": \"voyage-multimodal-3.5\",\n",
+ " \"inputs\": [{\"content\": [{\"type\": \"text\", \"text\": text}]}],\n",
+ " \"input_type\": \"document\",\n",
+ " \"output_encoding\": \"base64\",\n",
+ "}\n",
+ "base64_response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(base64_body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "base64_embedding = base64_response.json()[\"data\"][0][\"embedding\"]\n",
+ "\n",
+ "# Decode the base64 embedding\n",
+ "decoded_embedding = np.frombuffer(base64.b64decode(base64_embedding), dtype=np.float32)\n",
+ "\n",
+ "print(\"Default output (list of floats):\")\n",
+ "print(f\" Type: {type(default_embedding)}\")\n",
+ "print(f\" Length: {len(default_embedding)}\")\n",
+ "print(f\" First 5 values: {default_embedding[:5]}\")\n",
+ "\n",
+ "print(\"\\nBase64 output:\")\n",
+ "print(f\" Type: {type(base64_embedding)}\")\n",
+ "print(f\" Length: {len(base64_embedding)} characters\")\n",
+ "print(f\" Decoded length: {len(decoded_embedding)}\")\n",
+ "print(f\" Decoded first 5 values: {decoded_embedding[:5].tolist()}\")\n",
+ "\n",
+ "# Verify they match\n",
+ "print(f\"\\nEmbeddings match: {np.allclose(default_embedding, decoded_embedding)}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "d256ca3e497d"
+ },
+ "source": [
+ "### Using different output dimensions\n",
+ "\n",
+ "Voyage Multimodal 3.5 supports multiple output dimensions: 256, 512, 1024 (default), and 2048. Smaller dimensions reduce storage and computation costs, while larger dimensions may provide better accuracy."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "edcfb14b5b50"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "# Use an image URL for testing different dimensions\n",
+ "image_url = \"https://raw.githubusercontent.com/voyage-ai/voyage-multimodal-3/refs/heads/main/images/banana.jpg\"\n",
+ "\n",
+ "# Test different output dimensions\n",
+ "dimensions = [256, 512, 1024, 2048]\n",
+ "\n",
+ "print(\"Comparing different output dimensions:\\n\")\n",
+ "for dim in dimensions:\n",
+ " body = {\n",
+ " \"model\": \"voyage-multimodal-3.5\",\n",
+ " \"inputs\": [\n",
+ " {\n",
+ " \"content\": [\n",
+ " {\n",
+ " \"type\": \"text\",\n",
+ " \"text\": \"A photo of a banana on a white background.\",\n",
+ " },\n",
+ " {\"type\": \"image_url\", \"image_url\": image_url},\n",
+ " ]\n",
+ " }\n",
+ " ],\n",
+ " \"output_dimension\": dim,\n",
+ " \"input_type\": \"document\",\n",
+ " }\n",
+ " response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ " )\n",
+ " result = response.json()\n",
+ " embedding = result[\"data\"][0][\"embedding\"]\n",
+ "\n",
+ " print(f\"Dimension {dim}:\")\n",
+ " print(f\" Length: {len(embedding)}\")\n",
+ " print(f\" First 5 values: {embedding[:5]}\")\n",
+ " print(f\" Storage size: ~{len(embedding) * 4} bytes (float32)\\n\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "a9898eb27810"
+ },
+ "source": [
+ "### Using different output data types\n",
+ "\n",
+ "Voyage Multimodal 3.5 supports multiple output data types to optimize for storage and performance:\n",
+ "\n",
+ "* **`float`** (default): 32-bit floating-point numbers, highest precision\n",
+ "* **`int8`**: 8-bit signed integers (-128 to 127), 4x smaller than float\n",
+ "* **`uint8`**: 8-bit unsigned integers (0 to 255), 4x smaller than float\n",
+ "* **`binary`**: Bit-packed signed integers (int8), 32x smaller than float\n",
+ "* **`ubinary`**: Bit-packed unsigned integers (uint8), 32x smaller than float\n",
+ "\n",
+ "Quantized formats (int8, uint8, binary, ubinary) trade some precision for significant storage savings."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "e84d93b0bb13"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "# Use an image URL for testing different data types\n",
+ "image_url = \"https://raw.githubusercontent.com/voyage-ai/voyage-multimodal-3/refs/heads/main/images/banana.jpg\"\n",
+ "\n",
+ "# Test different output data types\n",
+ "output_dtypes = [\"float\", \"int8\", \"uint8\", \"binary\", \"ubinary\"]\n",
+ "\n",
+ "print(\"Comparing different output data types:\\n\")\n",
+ "for dtype in output_dtypes:\n",
+ " body = {\n",
+ " \"model\": \"voyage-multimodal-3.5\",\n",
+ " \"inputs\": [\n",
+ " {\n",
+ " \"content\": [\n",
+ " {\n",
+ " \"type\": \"text\",\n",
+ " \"text\": \"A photo of a banana on a white background.\",\n",
+ " },\n",
+ " {\"type\": \"image_url\", \"image_url\": image_url},\n",
+ " ]\n",
+ " }\n",
+ " ],\n",
+ " \"output_dimension\": 1024,\n",
+ " \"output_dtype\": dtype,\n",
+ " \"input_type\": \"document\",\n",
+ " }\n",
+ " response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ " )\n",
+ " result = response.json()\n",
+ " embedding = result[\"data\"][0][\"embedding\"]\n",
+ "\n",
+ " # Calculate actual storage size\n",
+ " if dtype == \"float\":\n",
+ " storage_bytes = len(embedding) * 4 # 4 bytes per float32\n",
+ " elif dtype in [\"int8\", \"uint8\"]:\n",
+ " storage_bytes = len(embedding) * 1 # 1 byte per int8/uint8\n",
+ " elif dtype in [\"binary\", \"ubinary\"]:\n",
+ " storage_bytes = len(embedding) * 1 # bit-packed, 1/8 of dimension\n",
+ "\n",
+ " print(f\"Output dtype: {dtype}\")\n",
+ " print(f\" Length: {len(embedding)}\")\n",
+ " print(f\" Value type: {type(embedding[0]).__name__}\")\n",
+ " print(f\" First 5 values: {embedding[:5]}\")\n",
+ " print(f\" Storage size: ~{storage_bytes} bytes\")\n",
+ "\n",
+ " # Calculate compression ratio vs float\n",
+ " if dtype != \"float\":\n",
+ " compression_ratio = (1024 * 4) / storage_bytes\n",
+ " print(f\" Compression: {compression_ratio:.1f}x smaller than float\")\n",
+ " print()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "a53ec669e084"
+ },
+ "source": [
+ "### Combining output_dimension and output_dtype\n",
+ "\n",
+ "You can combine different dimensions and data types to optimize for your use case.\n",
+ "\n",
+ "Please refer to our guide for details on [offset binary](https://docs.voyageai.com/docs/flexible-dimensions-and-quantization#offset-binary) and [binary embeddings](https://docs.voyageai.com/docs/flexible-dimensions-and-quantization#quantization)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "0da84ee9f088"
+ },
+ "outputs": [],
+ "source": [
+ "import json\n",
+ "\n",
+ "# Use an image URL for the comparison\n",
+ "image_url = \"https://raw.githubusercontent.com/voyage-ai/voyage-multimodal-3/refs/heads/main/images/banana.jpg\"\n",
+ "\n",
+ "# Example: Ultra-compact embeddings (256 dimensions + ubinary)\n",
+ "compact_body = {\n",
+ " \"model\": \"voyage-multimodal-3.5\",\n",
+ " \"inputs\": [\n",
+ " {\n",
+ " \"content\": [\n",
+ " {\"type\": \"text\", \"text\": \"A photo of a banana on a white background.\"},\n",
+ " {\"type\": \"image_url\", \"image_url\": image_url},\n",
+ " ]\n",
+ " }\n",
+ " ],\n",
+ " \"output_dimension\": 256,\n",
+ " \"output_dtype\": \"ubinary\", # Most compact format\n",
+ " \"input_type\": \"document\",\n",
+ "}\n",
+ "compact_response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(compact_body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "compact_result = compact_response.json()\n",
+ "compact_embedding = compact_result[\"data\"][0][\"embedding\"]\n",
+ "\n",
+ "# Example: High-precision embeddings (2048 dimensions + float)\n",
+ "precise_body = {\n",
+ " \"model\": \"voyage-multimodal-3.5\",\n",
+ " \"inputs\": [\n",
+ " {\n",
+ " \"content\": [\n",
+ " {\"type\": \"text\", \"text\": \"A photo of a banana on a white background.\"},\n",
+ " {\"type\": \"image_url\", \"image_url\": image_url},\n",
+ " ]\n",
+ " }\n",
+ " ],\n",
+ " \"output_dimension\": 2048,\n",
+ " \"output_dtype\": \"float\", # Highest precision\n",
+ " \"input_type\": \"document\",\n",
+ "}\n",
+ "precise_response = endpoint.invoke(\n",
+ " request_path=\"/multimodalembeddings\",\n",
+ " body=json.dumps(precise_body).encode(\"utf-8\"),\n",
+ " headers={\"Content-Type\": \"application/json\"},\n",
+ ")\n",
+ "precise_result = precise_response.json()\n",
+ "precise_embedding = precise_result[\"data\"][0][\"embedding\"]\n",
+ "\n",
+ "# Compare storage requirements\n",
+ "compact_storage = len(compact_embedding) * 1 # binary is bit-packed\n",
+ "precise_storage = len(precise_embedding) * 4 # float32\n",
+ "\n",
+ "print(\"Storage comparison:\\n\")\n",
+ "print(\"Ultra-compact (256-dim ubinary):\")\n",
+ "print(\" Dimension: 256\")\n",
+ "print(f\" Storage: ~{compact_storage} bytes\")\n",
+ "print(f\" First 5 values: {compact_embedding[:5]}\\n\")\n",
+ "\n",
+ "print(\"High-precision (2048-dim float):\")\n",
+ "print(f\" Dimension: {len(precise_embedding)}\")\n",
+ "print(f\" Storage: ~{precise_storage} bytes\")\n",
+ "print(f\" First 5 values: {precise_embedding[:5]}\\n\")\n",
+ "\n",
+ "print(f\"Storage ratio: {precise_storage / compact_storage:.1f}x\")\n",
+ "print(\"\\nFor 1 million vectors:\")\n",
+ "print(f\" Ultra-compact: ~{compact_storage * 1_000_000 / (1024**2):.1f} MB\")\n",
+ "print(f\" High-precision: ~{precise_storage * 1_000_000 / (1024**2):.1f} MB\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "435b202aa7d7"
+ },
+ "source": [
+ "## Cleaning up\n",
+ "\n",
+ "To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, delete the endpoint and undeploy the model."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "e8a61891590c"
+ },
+ "outputs": [],
+ "source": [
+ "# Delete the endpoint (this will also undeploy all models)\n",
+ "print(f\"Deleting endpoint: {endpoint.display_name}\")\n",
+ "endpoint.delete(force=True)\n",
+ "print(\"Endpoint deleted successfully!\")"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "name": "voyage-multimodal-3.5.ipynb",
+ "toc_visible": true
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}