diff --git a/notebooks/official/generative_ai/voyage-4.ipynb b/notebooks/official/generative_ai/voyage-4.ipynb new file mode 100644 index 000000000..f60195cfa --- /dev/null +++ b/notebooks/official/generative_ai/voyage-4.ipynb @@ -0,0 +1,801 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "id": "cell-0", + "metadata": { + "id": "6b3283cdfd08" + }, + "outputs": [], + "source": [ + "# Copyright 2026 MongoDB, Inc\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "id": "cell-1", + "metadata": { + "id": "d2b5eaffd266" + }, + "source": [ + "# Voyage 4 Embedding Models\n", + "\n", + "This notebook demonstrates how to deploy and use the Voyage 4 family of embedding models, featuring an **industry-first shared embedding space** that allows you to mix and match models for optimal cost and performance.\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Google
Open in Colab\n", + "
\n", + "
\n", + " \n", + " \"Google
Open in Colab Enterprise\n", + "
\n", + "
\n", + " \n", + " \"Vertex
Open in Workbench\n", + "
\n", + "
\n", + " \n", + " \"GitHub
View on GitHub\n", + "
\n", + "
" + ] + }, + { + "cell_type": "markdown", + "id": "cell-2", + "metadata": { + "id": "a46455330445" + }, + "source": [ + "## Overview\n", + "\n", + "The **Voyage 4** family introduces an **industry-first shared embedding space** across all model sizes. This means embeddings from any Voyage 4 model are **interchangeable**—you can encode documents with one model and queries with another, enabling optimal cost-performance trade-offs.\n", + "\n", + "### Key Features\n", + "\n", + "* **Shared Embedding Space**: All Voyage 4 models (large, standard, lite) produce compatible embeddings, so you can mix models for documents vs. queries\n", + "* **Matryoshka Representation Learning (MRL)**: Variable-dimension embeddings (256, 512, 1024, 2048) from the same model\n", + "* **Quantization-Aware Training (QAT)**: Optimized for int8, uint8, binary, and ubinary formats with minimal quality loss\n", + "* **Maximum 32K tokens input**: Support for long documents\n", + "\n", + "### Model Family\n", + "\n", + "| Model | Description | Best For |\n", + "| :--- | :--- | :--- |\n", + "| **voyage-4-large** | State-of-the-art general-purpose and multilingual embedding optimized for retrieval quality | Document embeddings where quality matters most |\n", + "| **voyage-4** | General-purpose multilingual embedding model optimized for retrieval/search and AI applications | Balanced cost/quality trade-off |\n", + "| **voyage-4-lite** | Lightweight general-purpose embedding model optimized for low latency and cost | Query embeddings and cost-sensitive applications |\n", + "\n", + "### What you'll learn\n", + "\n", + "In this notebook, you will:\n", + "\n", + "* Deploy a Voyage 4 model to a Vertex AI endpoint\n", + "* Generate embeddings and perform semantic similarity\n", + "* Explore advanced parameters (dimensions, quantization)\n", + "\n", + "### Costs\n", + "\n", + "This tutorial uses billable components of Google Cloud:\n", + "\n", + "* Vertex AI Model Garden\n", + "* Vertex AI Prediction endpoints\n", + "\n", + "Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage." + ] + }, + { + "cell_type": "markdown", + "id": "cell-3", + "metadata": { + "id": "4d998a5140b2" + }, + "source": [ + "## Get started" + ] + }, + { + "cell_type": "markdown", + "id": "cell-4", + "metadata": { + "id": "b92cb16aea9c" + }, + "source": [ + "### Install Vertex AI SDK for Python and other required packages\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cell-5", + "metadata": { + "id": "030faea19be1" + }, + "outputs": [], + "source": [ + "! pip3 install --upgrade --quiet google-cloud-aiplatform numpy" + ] + }, + { + "cell_type": "markdown", + "id": "cell-6", + "metadata": { + "id": "848322ec177e" + }, + "source": [ + "### Restart runtime (Colab only)\n", + "\n", + "To use the newly installed packages, you must restart the runtime on Google Colab." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cell-7", + "metadata": { + "id": "b8d49bb74a53" + }, + "outputs": [], + "source": [ + "import sys\n", + "\n", + "if \"google.colab\" in sys.modules:\n", + "\n", + " import IPython\n", + "\n", + " app = IPython.Application.instance()\n", + " app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "id": "cell-8", + "metadata": { + "id": "780490bfb862" + }, + "source": [ + "
\n", + "⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "cell-9", + "metadata": { + "id": "1117fcd212f8" + }, + "source": [ + "### Authenticate your notebook environment (Colab only)\n", + "\n", + "Authenticate your environment on Google Colab.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cell-10", + "metadata": { + "id": "015bf6d5da75" + }, + "outputs": [], + "source": [ + "import sys\n", + "\n", + "if \"google.colab\" in sys.modules:\n", + "\n", + " from google.colab import auth\n", + "\n", + " auth.authenticate_user()" + ] + }, + { + "cell_type": "markdown", + "id": "cell-11", + "metadata": { + "id": "722a10c66085" + }, + "source": [ + "### Set Google Cloud project information and initialize Vertex AI SDK for Python\n", + "\n", + "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cell-12", + "metadata": { + "id": "0f16c41d33fd" + }, + "outputs": [], + "source": [ + "# @title Setup Google Cloud project\n", + "\n", + "# Set your Google Cloud project ID and region below:\n", + "\n", + "import os\n", + "\n", + "import vertexai\n", + "\n", + "# @markdown Enter your project ID if not auto-detected:\n", + "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}\n", + "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n", + " PROJECT_ID = os.environ.get(\"GOOGLE_CLOUD_PROJECT\")\n", + "\n", + "# @markdown Select your region:\n", + "LOCATION = \"us-central1\" # @param [\"us-central1\", \"us-east1\", \"us-west1\", \"europe-west1\", \"europe-west4\", \"asia-east1\", \"asia-southeast1\"]\n", + "\n", + "print(f\"Project ID: {PROJECT_ID}\")\n", + "print(f\"Location: {LOCATION}\")\n", + "\n", + "vertexai.init(project=PROJECT_ID, location=LOCATION)" + ] + }, + { + "cell_type": "markdown", + "id": "cell-13", + "metadata": { + "id": "4fa0f29e38cf" + }, + "source": [ + "## Deploy model\n", + "\n", + "The Voyage 4 family features a **shared embedding space**, meaning embeddings from any Voyage 4 model (large, standard, lite) are interchangeable. This allows you to use different models for documents and queries while maintaining compatibility.\n", + "\n", + "For this notebook, we'll deploy a single endpoint. The three models are:\n", + "\n", + "* **voyage-4-large** — State-of-the-art retrieval quality, ideal for document embeddings\n", + "* **voyage-4** — Balanced for retrieval/search and AI applications\n", + "* **voyage-4-lite** — Optimized for low latency and cost, ideal for query embeddings" + ] + }, + { + "cell_type": "markdown", + "id": "cell-14", + "metadata": { + "id": "45286665838d" + }, + "source": [ + "### Initialize the Model\n", + "\n", + "Initialize the Voyage 4 model from Model Garden.\n", + "\n", + "Use the `list_deploy_options()` method to view the verified deployment configurations for your selected model." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cell-15", + "metadata": { + "id": "7675f3b3f3de" + }, + "outputs": [], + "source": [ + "from vertexai import model_garden\n", + "\n", + "# @title Select Model\n", + "# @markdown Choose the Voyage 4 model to deploy:\n", + "MODEL = \"voyage-4\" # @param [\"voyage-4-large\", \"voyage-4\", \"voyage-4-lite\"]\n", + "\n", + "# Default to voyage-4 if not set\n", + "if not MODEL:\n", + " MODEL = \"voyage-4\"\n", + "\n", + "MODEL_NAME = f\"mongodb/{MODEL}@latest\"\n", + "model = model_garden.OpenModel(MODEL_NAME)\n", + "\n", + "# Set accelerator based on model (voyage-4-large requires 80GB GPU)\n", + "if MODEL == \"voyage-4-large\":\n", + " MACHINE_TYPE = \"a2-ultragpu-1g\"\n", + " ACCELERATOR_TYPE = \"NVIDIA_A100_80GB\"\n", + "else:\n", + " MACHINE_TYPE = \"a2-highgpu-1g\"\n", + " ACCELERATOR_TYPE = \"NVIDIA_TESLA_A100\"\n", + "\n", + "print(f\"Selected model: {MODEL_NAME}\")\n", + "print(f\"Accelerator: {ACCELERATOR_TYPE} on {MACHINE_TYPE}\")\n", + "deploy_options = model.list_deploy_options(concise=True)\n", + "print(deploy_options)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cell-16", + "metadata": { + "id": "4e40b2f2141f" + }, + "outputs": [], + "source": [ + "# @title Deploy or connect to endpoint\n", + "# @markdown Choose whether to deploy a new model or use an existing endpoint:\n", + "\n", + "deployment_option = \"deploy_new\" # @param [\"deploy_new\", \"use_existing\"]\n", + "\n", + "# @markdown ---\n", + "# @markdown If using existing endpoint, provide the endpoint ID:\n", + "ENDPOINT_ID = \"\" # @param {type:\"string\"}\n", + "\n", + "if deployment_option == \"deploy_new\":\n", + " print(f\"Deploying {MODEL}...\")\n", + " print(f\"Using {ACCELERATOR_TYPE} on {MACHINE_TYPE}\")\n", + " endpoint = model.deploy(\n", + " machine_type=MACHINE_TYPE,\n", + " accelerator_type=ACCELERATOR_TYPE,\n", + " accelerator_count=1,\n", + " accept_eula=True,\n", + " use_dedicated_endpoint=True,\n", + " )\n", + " print(f\"Endpoint deployed: {endpoint.display_name}\")\n", + " print(f\"Endpoint resource name: {endpoint.resource_name}\")\n", + "else:\n", + " if not ENDPOINT_ID:\n", + " raise ValueError(\"Please provide an ENDPOINT_ID when using existing endpoint\")\n", + "\n", + " from google.cloud import aiplatform\n", + "\n", + " print(f\"Connecting to existing endpoint: {ENDPOINT_ID}\")\n", + " endpoint = aiplatform.Endpoint(\n", + " endpoint_name=f\"projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/{ENDPOINT_ID}\"\n", + " )\n", + " print(f\"Using endpoint: {endpoint.display_name}\")\n", + " print(f\"Endpoint resource name: {endpoint.resource_name}\")" + ] + }, + { + "cell_type": "markdown", + "id": "cell-20", + "metadata": { + "id": "0233e16ca856" + }, + "source": [ + "## Generate embeddings\n", + "\n", + "Now let's look at basic embedding generation with the Voyage 4 models." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cell-21", + "metadata": { + "id": "b7d228ce5961" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "# Multiple texts to embed\n", + "texts = [\n", + " \"Machine learning enables computers to learn from data.\",\n", + " \"Natural language processing helps computers understand human language.\",\n", + " \"Computer vision allows machines to interpret visual information.\",\n", + " \"Deep learning uses neural networks with multiple layers.\",\n", + "]\n", + "\n", + "# Prepare the batch request and make invoke call\n", + "body = {\"input\": texts, \"output_dimension\": 1024, \"input_type\": \"document\"}\n", + "response = endpoint.invoke(\n", + " request_path=\"/embeddings\",\n", + " body=json.dumps(body).encode(\"utf-8\"),\n", + " headers={\"Content-Type\": \"application/json\"},\n", + ")\n", + "\n", + "# Extract embeddings\n", + "result = response.json()\n", + "embeddings = [item[\"embedding\"] for item in result[\"data\"]]\n", + "\n", + "print(f\"Number of texts embedded: {len(embeddings)}\")\n", + "print(f\"Embedding dimension: {len(embeddings[0])}\")\n", + "print(f\"\\nFirst embedding (first 5 values): {embeddings[0][:5]}\")\n", + "print(f\"Second embedding (first 5 values): {embeddings[1][:5]}\")" + ] + }, + { + "cell_type": "markdown", + "id": "cell-22", + "metadata": { + "id": "a47938955d96" + }, + "source": [ + "### Semantic similarity\n", + "\n", + "Use embeddings to compute semantic similarity between text:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cell-23", + "metadata": { + "id": "d65defacbb93" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "import numpy as np\n", + "\n", + "\n", + "def cosine_similarity(vec1, vec2):\n", + " \"\"\"Calculate cosine similarity between two vectors.\"\"\"\n", + " vec1 = np.array(vec1)\n", + " vec2 = np.array(vec2)\n", + " return np.dot(vec1, vec2) / (np.linalg.norm(vec1) * np.linalg.norm(vec2))\n", + "\n", + "\n", + "# Example texts\n", + "query = \"How do computers learn from examples?\"\n", + "documents = [\n", + " \"Machine learning enables computers to learn from data.\",\n", + " \"The weather today is sunny and warm.\",\n", + " \"Neural networks can recognize patterns in data.\",\n", + " \"I enjoy cooking Italian food.\",\n", + "]\n", + "\n", + "# Get embeddings - using invoke with /embeddings endpoint\n", + "all_texts = [query] + documents\n", + "body = {\"input\": all_texts, \"output_dimension\": 1024, \"input_type\": \"document\"}\n", + "response = endpoint.invoke(\n", + " request_path=\"/embeddings\",\n", + " body=json.dumps(body).encode(\"utf-8\"),\n", + " headers={\"Content-Type\": \"application/json\"},\n", + ")\n", + "result = response.json()\n", + "all_embeddings = [item[\"embedding\"] for item in result[\"data\"]]\n", + "\n", + "query_embedding = all_embeddings[0]\n", + "doc_embeddings = all_embeddings[1:]\n", + "\n", + "# Calculate similarities\n", + "print(f\"Query: {query}\\n\")\n", + "print(\"Similarity scores:\")\n", + "for i, doc in enumerate(documents):\n", + " similarity = cosine_similarity(query_embedding, doc_embeddings[i])\n", + " print(f\"{similarity:.4f} - {doc}\")" + ] + }, + { + "cell_type": "markdown", + "id": "cell-26", + "metadata": { + "id": "c3b964fe9387" + }, + "source": [ + "## Advanced parameters\n", + "\n", + "Let's explore the advanced parameters that Voyage 4 models support to optimize your embeddings." + ] + }, + { + "cell_type": "markdown", + "id": "cell-27", + "metadata": { + "id": "865ef46609eb" + }, + "source": [ + "### Understanding input_type: Query vs Document\n", + "\n", + "The `input_type` parameter optimizes embeddings for retrieval tasks:\n", + "\n", + "* **`query`**: Use this when the text represents a search query or question. The model prepends \"Represent the query for retrieving supporting documents: \" to optimize for retrieval.\n", + "* **`document`**: Use this when the text represents a document or passage to be searched. The model prepends \"Represent the document for retrieval: \" to optimize for indexing.\n", + "* **`null`** (default): No special prompt is added. Use for general-purpose embeddings.\n", + "\n", + "**Best Practice**: For retrieval/search applications, use `input_type=\"query\"` for your search queries and `input_type=\"document\"` for the documents you're indexing." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cell-28", + "metadata": { + "id": "a12cf1ce0be7" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "# Example: Using input_type for retrieval\n", + "query_text = \"What is machine learning?\"\n", + "document_texts = [\n", + " \"Machine learning enables computers to learn from data.\",\n", + " \"Natural language processing helps computers understand human language.\",\n", + " \"Computer vision allows machines to interpret visual information.\",\n", + "]\n", + "\n", + "# Generate query embedding with input_type=\"query\"\n", + "query_body = {\n", + " \"input\": [query_text],\n", + " \"output_dimension\": 1024,\n", + " \"input_type\": \"query\", # Optimize for search queries\n", + "}\n", + "query_response = endpoint.invoke(\n", + " request_path=\"/embeddings\",\n", + " body=json.dumps(query_body).encode(\"utf-8\"),\n", + " headers={\"Content-Type\": \"application/json\"},\n", + ")\n", + "query_result = query_response.json()\n", + "query_embedding = query_result[\"data\"][0][\"embedding\"]\n", + "\n", + "# Generate document embeddings with input_type=\"document\"\n", + "doc_body = {\n", + " \"input\": document_texts,\n", + " \"output_dimension\": 1024,\n", + " \"input_type\": \"document\", # Optimize for document indexing\n", + "}\n", + "doc_response = endpoint.invoke(\n", + " request_path=\"/embeddings\",\n", + " body=json.dumps(doc_body).encode(\"utf-8\"),\n", + " headers={\"Content-Type\": \"application/json\"},\n", + ")\n", + "doc_result = doc_response.json()\n", + "doc_embeddings = [item[\"embedding\"] for item in doc_result[\"data\"]]\n", + "\n", + "print(f\"Query: {query_text}\")\n", + "print(f\"Query embedding dimension: {len(query_embedding)}\")\n", + "print(f\"\\nNumber of documents embedded: {len(doc_embeddings)}\")\n", + "print(f\"Document embedding dimension: {len(doc_embeddings[0])}\")\n", + "print(f\"\\nQuery embedding (first 5 values): {query_embedding[:5]}\")\n", + "print(f\"First document embedding (first 5 values): {doc_embeddings[0][:5]}\")" + ] + }, + { + "cell_type": "markdown", + "id": "cell-29", + "metadata": { + "id": "055a68775299" + }, + "source": [ + "### Using different output dimensions (Matryoshka Representation Learning)\n", + "\n", + "Voyage 4 models support **Matryoshka Representation Learning (MRL)**, providing variable-dimension embeddings: 256, 512, 1024 (default), and 2048. Smaller dimensions reduce storage and computation costs, while larger dimensions may provide better accuracy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cell-30", + "metadata": { + "id": "8e5ecdbb4d5c" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "text = \"Machine learning enables computers to learn from data.\"\n", + "\n", + "# Test different output dimensions\n", + "dimensions = [256, 512, 1024, 2048]\n", + "\n", + "print(\"Comparing different output dimensions (MRL):\\n\")\n", + "for dim in dimensions:\n", + " body = {\"input\": [text], \"output_dimension\": dim, \"input_type\": \"document\"}\n", + " response = endpoint.invoke(\n", + " request_path=\"/embeddings\",\n", + " body=json.dumps(body).encode(\"utf-8\"),\n", + " headers={\"Content-Type\": \"application/json\"},\n", + " )\n", + " result = response.json()\n", + " embedding = result[\"data\"][0][\"embedding\"]\n", + "\n", + " print(f\"Dimension {dim}:\")\n", + " print(f\" Length: {len(embedding)}\")\n", + " print(f\" First 5 values: {embedding[:5]}\")\n", + " print(f\" Storage size: ~{len(embedding) * 4} bytes (float32)\\n\")" + ] + }, + { + "cell_type": "markdown", + "id": "cell-31", + "metadata": { + "id": "2ea3386e2fbb" + }, + "source": [ + "### Using different output data types (Quantization-Aware Training)\n", + "\n", + "Voyage 4 models support **Quantization-Aware Training (QAT)**, optimizing embeddings for multiple output data types:\n", + "\n", + "* **`float`** (default): 32-bit floating-point numbers, highest precision\n", + "* **`int8`**: 8-bit signed integers (-128 to 127), 4x smaller than float\n", + "* **`uint8`**: 8-bit unsigned integers (0 to 255), 4x smaller than float\n", + "* **`binary`**: Bit-packed signed integers (int8), 32x smaller than float\n", + "* **`ubinary`**: Bit-packed unsigned integers (uint8), 32x smaller than float\n", + "\n", + "Quantized formats trade some precision for significant storage savings, with minimal quality loss thanks to QAT." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cell-32", + "metadata": { + "id": "3f7f5abccd85" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "text = \"Machine learning enables computers to learn from data.\"\n", + "\n", + "# Test different output data types\n", + "output_dtypes = [\"float\", \"int8\", \"uint8\", \"binary\", \"ubinary\"]\n", + "output_dimension = 1024\n", + "\n", + "print(\"Comparing different output data types (QAT):\\n\")\n", + "for dtype in output_dtypes:\n", + " body = {\n", + " \"input\": [text],\n", + " \"output_dimension\": output_dimension,\n", + " \"output_dtype\": dtype,\n", + " \"input_type\": \"document\",\n", + " }\n", + " response = endpoint.invoke(\n", + " request_path=\"/embeddings\",\n", + " body=json.dumps(body).encode(\"utf-8\"),\n", + " headers={\"Content-Type\": \"application/json\"},\n", + " )\n", + " result = response.json()\n", + " embedding = result[\"data\"][0][\"embedding\"]\n", + "\n", + " # Calculate actual storage size\n", + " if dtype == \"float\":\n", + " storage_bytes = len(embedding) * 4 # 4 bytes per float32\n", + " elif dtype in [\"int8\", \"uint8\"]:\n", + " storage_bytes = len(embedding) * 1 # 1 byte per int8/uint8\n", + " elif dtype in [\"binary\", \"ubinary\"]:\n", + " storage_bytes = len(embedding) * 1 # bit-packed, 1/8 of dimension\n", + "\n", + " print(f\"Output dtype: {dtype}\")\n", + " print(f\" Length: {len(embedding)}\")\n", + " print(f\" Value type: {type(embedding[0]).__name__}\")\n", + " print(f\" First 5 values: {embedding[:5]}\")\n", + " print(f\" Storage size: ~{storage_bytes} bytes\")\n", + "\n", + " # Calculate compression ratio vs float\n", + " if dtype != \"float\":\n", + " compression_ratio = (output_dimension * 4) / storage_bytes\n", + " print(f\" Compression: {compression_ratio:.1f}x smaller than float\")\n", + " print()" + ] + }, + { + "cell_type": "markdown", + "id": "cell-33", + "metadata": { + "id": "130d571dd4a8" + }, + "source": [ + "### Combining output_dimension and output_dtype\n", + "\n", + "You can combine different dimensions and data types to optimize for your use case.\n", + "\n", + "Please refer to our guide for details on [offset binary](https://docs.voyageai.com/docs/flexible-dimensions-and-quantization#offset-binary) and [binary embeddings](https://docs.voyageai.com/docs/flexible-dimensions-and-quantization#quantization). " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cell-34", + "metadata": { + "id": "2738b2576859" + }, + "outputs": [], + "source": [ + "import json\n", + "\n", + "text = \"Machine learning enables computers to learn from data.\"\n", + "\n", + "# Example: Ultra-compact embeddings (256 dimensions + ubinary)\n", + "compact_body = {\n", + " \"input\": [text],\n", + " \"output_dimension\": 256,\n", + " \"output_dtype\": \"ubinary\", # Most compact format\n", + " \"input_type\": \"document\",\n", + "}\n", + "compact_response = endpoint.invoke(\n", + " request_path=\"/embeddings\",\n", + " body=json.dumps(compact_body).encode(\"utf-8\"),\n", + " headers={\"Content-Type\": \"application/json\"},\n", + ")\n", + "compact_result = compact_response.json()\n", + "compact_embedding = compact_result[\"data\"][0][\"embedding\"]\n", + "\n", + "# Example: High-precision embeddings (2048 dimensions + float)\n", + "precise_body = {\n", + " \"input\": [text],\n", + " \"output_dimension\": 2048,\n", + " \"output_dtype\": \"float\", # Highest precision\n", + " \"input_type\": \"document\",\n", + "}\n", + "precise_response = endpoint.invoke(\n", + " request_path=\"/embeddings\",\n", + " body=json.dumps(precise_body).encode(\"utf-8\"),\n", + " headers={\"Content-Type\": \"application/json\"},\n", + ")\n", + "precise_result = precise_response.json()\n", + "precise_embedding = precise_result[\"data\"][0][\"embedding\"]\n", + "\n", + "# Compare storage requirements\n", + "compact_storage = len(compact_embedding) * 1 # binary is bit-packed\n", + "precise_storage = len(precise_embedding) * 4 # float32\n", + "\n", + "print(\"Storage comparison:\\n\")\n", + "print(\"Ultra-compact (256-dim ubinary):\")\n", + "print(\" Dimension: 256\")\n", + "print(f\" Storage: ~{compact_storage} bytes\")\n", + "print(f\" First 5 values: {compact_embedding[:5]}\\n\")\n", + "\n", + "print(\"High-precision (2048-dim float):\")\n", + "print(f\" Dimension: {len(precise_embedding)}\")\n", + "print(f\" Storage: ~{precise_storage} bytes\")\n", + "print(f\" First 5 values: {precise_embedding[:5]}\\n\")\n", + "\n", + "print(f\"Storage ratio: {precise_storage / compact_storage:.1f}x\")\n", + "print(\"\\nFor 1 million vectors:\")\n", + "print(f\" Ultra-compact: ~{compact_storage * 1_000_000 / (1024**2):.1f} MB\")\n", + "print(f\" High-precision: ~{precise_storage * 1_000_000 / (1024**2):.1f} MB\")" + ] + }, + { + "cell_type": "markdown", + "id": "cell-35", + "metadata": { + "id": "3067b6759d2b" + }, + "source": [ + "## Cleaning up\n", + "\n", + "To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, delete the endpoint." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cell-36", + "metadata": { + "id": "f59817e807ee" + }, + "outputs": [], + "source": [ + "# Delete the endpoint (this will also undeploy all models)\n", + "print(f\"Deleting endpoint: {endpoint.display_name}\")\n", + "endpoint.delete(force=True)\n", + "print(\"Endpoint deleted successfully!\")" + ] + } + ], + "metadata": { + "colab": { + "name": "voyage-4.ipynb", + "toc_visible": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}