diff --git a/api-reference/workflow/workflows.mdx b/api-reference/workflow/workflows.mdx
index d0d222d0..4ec9737a 100644
--- a/api-reference/workflow/workflows.mdx
+++ b/api-reference/workflow/workflows.mdx
@@ -1923,11 +1923,20 @@ Allowed values for `subtype` and `model_name` include:
- `"subtype": "voyageai"`
+ - `"model_name": "voyage-context-3"`
+ - `"model_name": "voyage-3.5"`
+ - `"model_name": "voyage-3.5-lite"`
- `"model_name": "voyage-3"`
- `"model_name": "voyage-3-large"`
- `"model_name": "voyage-3-lite"`
+ - `"model_name": "voyage-3-m-exp"`
+ - `"model_name": "voyage-2"`
+ - `"model_name": "voyage-02"`
+ - `"model_name": "voyage-large-2"`
+ - `"model_name": "voyage-large-2-instruct"`
- `"model_name": "voyage-code-3"`
+ - `"model_name": "voyage-code-2"`
- `"model_name": "voyage-finance-2"`
- `"model_name": "voyage-law-2"`
- - `"model_name": "voyage-code-2"`
+ - `"model_name": "voyage-multilingual-2"`
- `"model_name": "voyage-multimodal-3"`
\ No newline at end of file
diff --git a/open-source/how-to/embedding.mdx b/open-source/how-to/embedding.mdx
index 99d168f6..23d56f60 100644
--- a/open-source/how-to/embedding.mdx
+++ b/open-source/how-to/embedding.mdx
@@ -57,7 +57,17 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
- `openai` for [OpenAI](https://openai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/openai/).
- `togetherai` for [Together.ai](https://www.together.ai/). [Learn more](https://docs.together.ai/docs/embedding-models).
- `vertexai` for [Google Vertex AI PaLM](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/google_vertex_ai_palm/).
- - `voyageai` for [Voyage AI](https://www.voyageai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/voyageai/).
+ - `voyageai` for [Voyage AI](https://www.voyageai.com/). [Learn more](https://docs.voyageai.com/docs/embeddings).
+
+
+ Voyage AI offers multiple embedding models optimized for different use cases:
+ - **voyage-3.5** and **voyage-3.5-lite**: Latest models with high token limits (320k and 1M tokens respectively)
+ - **voyage-context-3**: Specialized model for contextualized embeddings that capture relationships between documents
+ - **voyage-code-3** and **voyage-code-2**: Optimized for code embeddings
+ - **voyage-finance-2**, **voyage-law-2**, **voyage-multilingual-2**: Domain-specific models
+ - **voyage-multimodal-3**: Supports multimodal embeddings
+ - Additional models available for various use cases
+
2. Run the following command to install the required Python package for the embedding provider:
@@ -86,7 +96,15 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
- `openai`. [Choose a model](https://platform.openai.com/docs/guides/embeddings/embedding-models), or use the default model `text-embedding-ada-002`.
- `togetherai`. [Choose a model](https://docs.together.ai/docs/embedding-models), or use the default model `togethercomputer/m2-bert-80M-32k-retrieval`.
- `vertexai`. [Choose a model](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api), or use the default model `text-embedding-05`.
- - `voyageai`. [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided.
+ - `voyageai`. [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided. Available models include:
+ - **voyage-3.5**: High-performance model with 320k token limit and 1024 dimensions
+ - **voyage-3.5-lite**: Lightweight model with 1M token limit and 512 dimensions
+ - **voyage-context-3**: Contextualized embedding model with 32k token limit
+ - **voyage-3**, **voyage-3-large**, **voyage-3-lite**: General-purpose models
+ - **voyage-2**, **voyage-02**: Previous generation models
+ - **voyage-code-3**, **voyage-code-2**: Code-specialized models
+ - **voyage-finance-2**, **voyage-law-2**, **voyage-multilingual-2**: Domain-specific models
+ - **voyage-multimodal-3**: Multimodal embedding support
4. Note the special settings to connect to the provider:
@@ -157,3 +175,63 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
- Set `embedding_aws_region` to the corresponding AWS Region identifier.
+
+## VoyageAI Advanced Features
+
+VoyageAI embeddings offer several advanced capabilities beyond standard embedding generation:
+
+### Contextualized Embeddings
+
+The `voyage-context-3` model provides contextualized embeddings that capture relationships between documents in a batch. This is particularly useful for RAG applications where understanding document relationships improves retrieval accuracy.
+
+### Automatic Batching
+
+VoyageAI integration automatically handles batching based on:
+- Model-specific token limits (ranging from 32k to 1M tokens depending on the model)
+- Maximum batch size of 1000 documents per request
+- Efficient token counting to optimize API usage
+
+### Output Dimension Control
+
+You can specify a custom `output_dimension` parameter to reduce the dimensionality of embeddings, which can:
+- Reduce storage requirements
+- Speed up similarity search
+- Maintain embedding quality for many use cases
+
+### Progress Tracking
+
+Enable `show_progress_bar` to monitor embedding progress for large document collections. This requires installing `tqdm`: `pip install tqdm`.
+
+### Example: Using VoyageAI with Ingest CLI
+
+```bash
+unstructured-ingest \
+ local \
+ --input-path /path/to/documents \
+ --output-dir /path/to/output \
+ --embedding-provider voyageai \
+ --embedding-api-key $VOYAGE_API_KEY \
+ --embedding-model-name voyage-3.5 \
+ --num-processes 2
+```
+
+### Example: Using VoyageAI with Contextualized Embeddings
+
+```bash
+unstructured-ingest \
+ local \
+ --input-path /path/to/documents \
+ --output-dir /path/to/output \
+ --embedding-provider voyageai \
+ --embedding-api-key $VOYAGE_API_KEY \
+ --embedding-model-name voyage-context-3 \
+ --num-processes 2
+```
+
+### Choosing the Right VoyageAI Model
+
+- **voyage-3.5**: Best for general-purpose embeddings with high token limits
+- **voyage-3.5-lite**: Optimal for very large documents or when you need maximum token capacity
+- **voyage-context-3**: Use when document relationships matter for your retrieval task
+- **voyage-code-3**: Specifically optimized for code and technical documentation
+- **Domain-specific models**: Choose finance-2, law-2, or multilingual-2 for specialized domains
diff --git a/snippets/general-shared-text/chunk-limits-embedding-models.mdx b/snippets/general-shared-text/chunk-limits-embedding-models.mdx
index 333b1ee4..55238538 100644
--- a/snippets/general-shared-text/chunk-limits-embedding-models.mdx
+++ b/snippets/general-shared-text/chunk-limits-embedding-models.mdx
@@ -19,13 +19,22 @@ as listed in the following table's last column.
| _Together AI_ | | | |
| M2-Bert 80M 32K Retrieval | 768 | 8192 | 28672 |
| _Voyage AI_ | | | |
-| Voyage 3 | 1024 | 32000 | 112000 |
-| Voyage 3 Large | 1024 | 32000 | 112000 |
-| Voyage 3 Lite | 512 | 32000 | 112000 |
-| Voyage Code 2 | 1536 | 16000| 56000 |
-| Voyage Code 3 | 1024 | 32000 | 112000 |
-| Voyage Finance 2 | 1024 | 32000| 112000 |
-| Voyage Law 2 | 1024 | 16000 | 56000 |
-| Voyage Multimodal 3 | 1024 | 32000 | 112000 |
+| Voyage Context 3 | 1024 | 32000 | 112000 |
+| Voyage 3.5 | 1024 | 320000 | 1120000 |
+| Voyage 3.5 Lite | 512 | 1000000 | 3500000 |
+| Voyage 3 | 1024 | 120000 | 420000 |
+| Voyage 3 Large | 1024 | 120000 | 420000 |
+| Voyage 3 Lite | 512 | 120000 | 420000 |
+| Voyage 3 M Exp | 1024 | 120000 | 420000 |
+| Voyage 2 | 1024 | 320000 | 1120000 |
+| Voyage 02 | 1024 | 320000 | 1120000 |
+| Voyage Large 2 | 1024 | 120000 | 420000 |
+| Voyage Large 2 Instruct | 1024 | 120000 | 420000 |
+| Voyage Code 3 | 1024 | 120000 | 420000 |
+| Voyage Code 2 | 1536 | 120000 | 420000 |
+| Voyage Finance 2 | 1024 | 120000 | 420000 |
+| Voyage Law 2 | 1024 | 120000 | 420000 |
+| Voyage Multilingual 2 | 1024 | 120000 | 420000 |
+| Voyage Multimodal 3 | 1024 | 120000 | 420000 |
* This is an approximate value, determined by multiplying the embedding model's token limit by 3.5.
\ No newline at end of file