Update VoyageAI docs

fzowl · fzowl · commit ee14951676d6 · 2025-11-14T16:55:25.000+01:00
diff --git a/api-reference/how-to/embedding.mdx b/api-reference/how-to/embedding.mdx
@@ -68,7 +68,7 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
    - `langchain-huggingface`. [Choose a model](https://huggingface.co/models?other=embeddings), or use the default model [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2).
    - `langchain-openai`. [Choose a model](https://platform.openai.com/docs/guides/embeddings/embedding-models), or use the default model `text-embedding-ada-002`.
    - `langchain-vertexai`. [Choose a model](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api), or use the default model `textembedding-gecko@001`.
-   - `langchain-voyageai`.  [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided.
+   - `langchain-voyageai`. [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided. Recommended models include `voyage-3.5` (latest general-purpose), `voyage-3.5-lite` (lightweight with higher token limits), `voyage-3-large` (enhanced performance), and `voyage-context-3` (for contextualized embeddings).
    - `mixedbread-ai`. [Choose a model](https://www.mixedbread.ai/docs/embeddings/models), or use the default model [mixedbread-ai/mxbai-embed-large-v1](https://www.mixedbread.ai/docs/embeddings/mxbai-embed-large-v1).
    - `octoai`. [Choose a model](https://octo.ai/blog/supercharge-rag-performance-using-octoai-and-unstructured-embeddings/), or use the default model `thenlper/gte-large`.
 
diff --git a/open-source/core-functionality/embedding.mdx b/open-source/core-functionality/embedding.mdx
@@ -207,9 +207,9 @@ print(embedding_encoder.is_unit_vector(), embedding_encoder.num_of_dimensions())
 
 ## `VoyageAIEmbeddingEncoder`
 
-The `VoyageAIEmbeddingEncoder` class connects to the VoyageAI to obtain embeddings for pieces of text.
+The `VoyageAIEmbeddingEncoder` class connects to the VoyageAI API to obtain embeddings for pieces of text using the VoyageAI Python client.
 
-`embed_documents` will receive a list of Elements, and return an updated list which includes the `embeddings` attribute for each Element.
+`embed_documents` will receive a list of Elements, and return an updated list which includes the `embeddings` attribute for each Element. The encoder automatically handles batching based on token limits for optimal performance.
 
 `embed_query` will receive a query as a string, and return a list of floats which is the embedding vector for the given query string.
 
@@ -219,31 +219,103 @@ The `VoyageAIEmbeddingEncoder` class connects to the VoyageAI to obtain embeddin
 
 The following code block shows an example of how to use `VoyageAIEmbeddingEncoder`. You will see the updated elements list (with the `embeddings` attribute included for each element), the embedding vector for the query string, and some metadata properties about the embedding model.
 
-To use Voyage AI you will need to pass Voyage AI API Key (obtained from [https://dash.voyageai.com/](https://dash.voyageai.com/)) as the `api_key` parameter.
+### Configuration Parameters
+
+To use Voyage AI you will need to pass the following parameters to `VoyageAIEmbeddingConfig`:
+
+- **`api_key`** (required): Voyage AI API Key obtained from [https://dash.voyageai.com/](https://dash.voyageai.com/)
+- **`model_name`** (required): The embedding model to use. Available models include:
+  - `voyage-3.5` - Latest general-purpose model with 1024 dimensions
+  - `voyage-3.5-lite` - Lightweight model with 512 dimensions and higher token limits
+  - `voyage-3-large` - Large model with enhanced performance
+  - `voyage-context-3` - Contextualized embedding model for document-level context
+  - `voyage-3`, `voyage-3-lite` - Previous generation models
+  - `voyage-2`, `voyage-02` - Legacy models
+  - Additional specialized models: `voyage-code-3`, `voyage-code-2`, `voyage-finance-2`, `voyage-law-2`, `voyage-multilingual-2`, `voyage-large-2`, `voyage-large-2-instruct`
+
+  For the complete list of available models, see [https://docs.voyageai.com/docs/embeddings](https://docs.voyageai.com/docs/embeddings)
+
+- **`show_progress_bar`** (optional, default: `False`): Display a progress bar during batch processing
+- **`batch_size`** (optional): Override the default batch size for embedding requests
+- **`truncation`** (optional): Enable automatic truncation of inputs that exceed token limits
+- **`output_dimension`** (optional): Specify a custom output dimension (model-dependent)
 
-The `model_name` parameter is mandatory, please check the available models at [https://docs.voyageai.com/docs/embeddings](https://docs.voyageai.com/docs/embeddings)
+### Basic Example
 
 ```python
 import os
 
 from unstructured.documents.elements import Text
 from unstructured.embed.voyageai import VoyageAIEmbeddingConfig, VoyageAIEmbeddingEncoder
 
+# Basic configuration with required parameters
 embedding_encoder = VoyageAIEmbeddingEncoder(
     config=VoyageAIEmbeddingConfig(
         api_key=os.environ["VOYAGE_API_KEY"],
-        model_name="voyage-3"
+        model_name="voyage-3.5"
     )
 )
+
+# Embed documents
 elements = embedding_encoder.embed_documents(
     elements=[Text("This is sentence 1"), Text("This is sentence 2")],
 )
 
+# Embed a query
 query = "This is the query"
 query_embedding = embedding_encoder.embed_query(query=query)
 
+# Print results
 [print(e, e.embeddings) for e in elements]
 print(query, query_embedding)
 print(embedding_encoder.is_unit_vector, embedding_encoder.num_of_dimensions)
+```
+
+### Advanced Example with Custom Options
+
+```python
+import os
+
+from unstructured.documents.elements import Text
+from unstructured.embed.voyageai import VoyageAIEmbeddingConfig, VoyageAIEmbeddingEncoder
+
+# Advanced configuration with optional parameters
+embedding_encoder = VoyageAIEmbeddingEncoder(
+    config=VoyageAIEmbeddingConfig(
+        api_key=os.environ["VOYAGE_API_KEY"],
+        model_name="voyage-3.5",
+        show_progress_bar=True,  # Display progress during processing
+        truncation=True,         # Automatically truncate long texts
+        output_dimension=512     # Use reduced dimensions
+    )
+)
 
+# Process a larger batch of documents
+elements = embedding_encoder.embed_documents(
+    elements=[Text(f"Document {i}") for i in range(100)],
+)
+```
+
+### Using Contextual Embeddings
+
+The `voyage-context-3` model provides contextualized embeddings that consider document-level context:
+
+```python
+import os
+
+from unstructured.documents.elements import Text
+from unstructured.embed.voyageai import VoyageAIEmbeddingConfig, VoyageAIEmbeddingEncoder
+
+# Configure for contextual embeddings
+embedding_encoder = VoyageAIEmbeddingEncoder(
+    config=VoyageAIEmbeddingConfig(
+        api_key=os.environ["VOYAGE_API_KEY"],
+        model_name="voyage-context-3"
+    )
+)
+
+# Embed documents with contextual understanding
+elements = embedding_encoder.embed_documents(
+    elements=[Text("Context-aware sentence 1"), Text("Context-aware sentence 2")],
+)
 ```
diff --git a/platform/workflows-automation.mdx b/platform/workflows-automation.mdx
@@ -110,10 +110,16 @@ To create a workflow:
 
         - **Anthropic**: Use Anthropic to generate embeddings. Also choose the embedding model to use, from one of the following:
 
-          - **voyage-2**: [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
-          - **voyage-large-2**: [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
-          - **voyage-code-2**: [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
-          - **voyage-lite-02-instruct**: [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
+          - **voyage-3.5**: Latest general-purpose model with 1024 dimensions. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
+          - **voyage-3.5-lite**: Lightweight model with 512 dimensions and higher token limits. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
+          - **voyage-3-large**: Enhanced performance model. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
+          - **voyage-3**: General-purpose model (120K token limit). [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
+          - **voyage-3-lite**: Lightweight model (120K token limit). [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
+          - **voyage-context-3**: Contextualized embedding model for document-level context. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
+          - **voyage-2**: Legacy general-purpose model. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
+          - **voyage-large-2**: Legacy large model. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
+          - **voyage-code-2**: Legacy code-specialized model. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
+          - **voyage-lite-02-instruct**: Legacy lightweight instruction-tuned model. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
         
         - **Hugging Face**: Use Hugging Face to generate embeddings. Also choose the embedding model to use, from one of the following: