Skip to content

Commit f7f95ce

Browse files
committed
Update VoyageAI docs
1 parent 425961e commit f7f95ce

File tree

3 files changed

+107
-11
lines changed

3 files changed

+107
-11
lines changed

api-reference/workflow/workflows.mdx

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1923,11 +1923,20 @@ Allowed values for `subtype` and `model_name` include:
19231923

19241924
- `"subtype": "voyageai"`
19251925

1926+
- `"model_name": "voyage-context-3"`
1927+
- `"model_name": "voyage-3.5"`
1928+
- `"model_name": "voyage-3.5-lite"`
19261929
- `"model_name": "voyage-3"`
19271930
- `"model_name": "voyage-3-large"`
19281931
- `"model_name": "voyage-3-lite"`
1932+
- `"model_name": "voyage-3-m-exp"`
1933+
- `"model_name": "voyage-2"`
1934+
- `"model_name": "voyage-02"`
1935+
- `"model_name": "voyage-large-2"`
1936+
- `"model_name": "voyage-large-2-instruct"`
19291937
- `"model_name": "voyage-code-3"`
1938+
- `"model_name": "voyage-code-2"`
19301939
- `"model_name": "voyage-finance-2"`
19311940
- `"model_name": "voyage-law-2"`
1932-
- `"model_name": "voyage-code-2"`
1941+
- `"model_name": "voyage-multilingual-2"`
19331942
- `"model_name": "voyage-multimodal-3"`

open-source/how-to/embedding.mdx

Lines changed: 80 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,17 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
5757
- `openai` for [OpenAI](https://openai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/openai/).
5858
- `togetherai` for [Together.ai](https://www.together.ai/). [Learn more](https://docs.together.ai/docs/embedding-models).
5959
- `vertexai` for [Google Vertex AI PaLM](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/google_vertex_ai_palm/).
60-
- `voyageai` for [Voyage AI](https://www.voyageai.com/). [Learn more](https://python.langchain.com/v0.2/docs/integrations/text_embedding/voyageai/).
60+
- `voyageai` for [Voyage AI](https://www.voyageai.com/). [Learn more](https://docs.voyageai.com/docs/embeddings).
61+
62+
<Note>
63+
Voyage AI offers multiple embedding models optimized for different use cases:
64+
- **voyage-3.5** and **voyage-3.5-lite**: Latest models with high token limits (320k and 1M tokens respectively)
65+
- **voyage-context-3**: Specialized model for contextualized embeddings that capture relationships between documents
66+
- **voyage-code-3** and **voyage-code-2**: Optimized for code embeddings
67+
- **voyage-finance-2**, **voyage-law-2**, **voyage-multilingual-2**: Domain-specific models
68+
- **voyage-multimodal-3**: Supports multimodal embeddings
69+
- Additional models available for various use cases
70+
</Note>
6171

6272
2. Run the following command to install the required Python package for the embedding provider:
6373

@@ -86,7 +96,15 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
8696
- `openai`. [Choose a model](https://platform.openai.com/docs/guides/embeddings/embedding-models), or use the default model `text-embedding-ada-002`.
8797
- `togetherai`. [Choose a model](https://docs.together.ai/docs/embedding-models), or use the default model `togethercomputer/m2-bert-80M-32k-retrieval`.
8898
- `vertexai`. [Choose a model](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api), or use the default model `text-embedding-05`.
89-
- `voyageai`. [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided.
99+
- `voyageai`. [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided. Available models include:
100+
- **voyage-3.5**: High-performance model with 320k token limit and 1024 dimensions
101+
- **voyage-3.5-lite**: Lightweight model with 1M token limit and 512 dimensions
102+
- **voyage-context-3**: Contextualized embedding model with 32k token limit
103+
- **voyage-3**, **voyage-3-large**, **voyage-3-lite**: General-purpose models
104+
- **voyage-2**, **voyage-02**: Previous generation models
105+
- **voyage-code-3**, **voyage-code-2**: Code-specialized models
106+
- **voyage-finance-2**, **voyage-law-2**, **voyage-multilingual-2**: Domain-specific models
107+
- **voyage-multimodal-3**: Multimodal embedding support
90108

91109
4. Note the special settings to connect to the provider:
92110

@@ -157,3 +175,63 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
157175
- Set `embedding_aws_region` to the corresponding AWS Region identifier.
158176
</Accordion>
159177
</AccordionGroup>
178+
179+
## VoyageAI Advanced Features
180+
181+
VoyageAI embeddings offer several advanced capabilities beyond standard embedding generation:
182+
183+
### Contextualized Embeddings
184+
185+
The `voyage-context-3` model provides contextualized embeddings that capture relationships between documents in a batch. This is particularly useful for RAG applications where understanding document relationships improves retrieval accuracy.
186+
187+
### Automatic Batching
188+
189+
VoyageAI integration automatically handles batching based on:
190+
- Model-specific token limits (ranging from 32k to 1M tokens depending on the model)
191+
- Maximum batch size of 1000 documents per request
192+
- Efficient token counting to optimize API usage
193+
194+
### Output Dimension Control
195+
196+
You can specify a custom `output_dimension` parameter to reduce the dimensionality of embeddings, which can:
197+
- Reduce storage requirements
198+
- Speed up similarity search
199+
- Maintain embedding quality for many use cases
200+
201+
### Progress Tracking
202+
203+
Enable `show_progress_bar` to monitor embedding progress for large document collections. This requires installing `tqdm`: `pip install tqdm`.
204+
205+
### Example: Using VoyageAI with Ingest CLI
206+
207+
```bash
208+
unstructured-ingest \
209+
local \
210+
--input-path /path/to/documents \
211+
--output-dir /path/to/output \
212+
--embedding-provider voyageai \
213+
--embedding-api-key $VOYAGE_API_KEY \
214+
--embedding-model-name voyage-3.5 \
215+
--num-processes 2
216+
```
217+
218+
### Example: Using VoyageAI with Contextualized Embeddings
219+
220+
```bash
221+
unstructured-ingest \
222+
local \
223+
--input-path /path/to/documents \
224+
--output-dir /path/to/output \
225+
--embedding-provider voyageai \
226+
--embedding-api-key $VOYAGE_API_KEY \
227+
--embedding-model-name voyage-context-3 \
228+
--num-processes 2
229+
```
230+
231+
### Choosing the Right VoyageAI Model
232+
233+
- **voyage-3.5**: Best for general-purpose embeddings with high token limits
234+
- **voyage-3.5-lite**: Optimal for very large documents or when you need maximum token capacity
235+
- **voyage-context-3**: Use when document relationships matter for your retrieval task
236+
- **voyage-code-3**: Specifically optimized for code and technical documentation
237+
- **Domain-specific models**: Choose finance-2, law-2, or multilingual-2 for specialized domains

snippets/general-shared-text/chunk-limits-embedding-models.mdx

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,22 @@ as listed in the following table's last column.
1919
| _Together AI_ | | | |
2020
| M2-Bert 80M 32K Retrieval | 768 | 8192 | 28672 |
2121
| _Voyage AI_ | | | |
22-
| Voyage 3 | 1024 | 32000 | 112000 |
23-
| Voyage 3 Large | 1024 | 32000 | 112000 |
24-
| Voyage 3 Lite | 512 | 32000 | 112000 |
25-
| Voyage Code 2 | 1536 | 16000| 56000 |
26-
| Voyage Code 3 | 1024 | 32000 | 112000 |
27-
| Voyage Finance 2 | 1024 | 32000| 112000 |
28-
| Voyage Law 2 | 1024 | 16000 | 56000 |
29-
| Voyage Multimodal 3 | 1024 | 32000 | 112000 |
22+
| Voyage Context 3 | 1024 | 32000 | 112000 |
23+
| Voyage 3.5 | 1024 | 320000 | 1120000 |
24+
| Voyage 3.5 Lite | 512 | 1000000 | 3500000 |
25+
| Voyage 3 | 1024 | 120000 | 420000 |
26+
| Voyage 3 Large | 1024 | 120000 | 420000 |
27+
| Voyage 3 Lite | 512 | 120000 | 420000 |
28+
| Voyage 3 M Exp | 1024 | 120000 | 420000 |
29+
| Voyage 2 | 1024 | 320000 | 1120000 |
30+
| Voyage 02 | 1024 | 320000 | 1120000 |
31+
| Voyage Large 2 | 1024 | 120000 | 420000 |
32+
| Voyage Large 2 Instruct | 1024 | 120000 | 420000 |
33+
| Voyage Code 3 | 1024 | 120000 | 420000 |
34+
| Voyage Code 2 | 1536 | 120000 | 420000 |
35+
| Voyage Finance 2 | 1024 | 120000 | 420000 |
36+
| Voyage Law 2 | 1024 | 120000 | 420000 |
37+
| Voyage Multilingual 2 | 1024 | 120000 | 420000 |
38+
| Voyage Multimodal 3 | 1024 | 120000 | 420000 |
3039

3140
<sup>*</sup> This is an approximate value, determined by multiplying the embedding model's token limit by 3.5.

0 commit comments

Comments
 (0)