Skip to content

Commit ee14951

Browse files
committed
Update VoyageAI docs
1 parent 8f46509 commit ee14951

File tree

3 files changed

+88
-10
lines changed

3 files changed

+88
-10
lines changed

api-reference/how-to/embedding.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
6868
- `langchain-huggingface`. [Choose a model](https://huggingface.co/models?other=embeddings), or use the default model [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2).
6969
- `langchain-openai`. [Choose a model](https://platform.openai.com/docs/guides/embeddings/embedding-models), or use the default model `text-embedding-ada-002`.
7070
- `langchain-vertexai`. [Choose a model](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api), or use the default model `textembedding-gecko@001`.
71-
- `langchain-voyageai`. [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided.
71+
- `langchain-voyageai`. [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided. Recommended models include `voyage-3.5` (latest general-purpose), `voyage-3.5-lite` (lightweight with higher token limits), `voyage-3-large` (enhanced performance), and `voyage-context-3` (for contextualized embeddings).
7272
- `mixedbread-ai`. [Choose a model](https://www.mixedbread.ai/docs/embeddings/models), or use the default model [mixedbread-ai/mxbai-embed-large-v1](https://www.mixedbread.ai/docs/embeddings/mxbai-embed-large-v1).
7373
- `octoai`. [Choose a model](https://octo.ai/blog/supercharge-rag-performance-using-octoai-and-unstructured-embeddings/), or use the default model `thenlper/gte-large`.
7474

open-source/core-functionality/embedding.mdx

Lines changed: 77 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -207,9 +207,9 @@ print(embedding_encoder.is_unit_vector(), embedding_encoder.num_of_dimensions())
207207

208208
## `VoyageAIEmbeddingEncoder`
209209

210-
The `VoyageAIEmbeddingEncoder` class connects to the VoyageAI to obtain embeddings for pieces of text.
210+
The `VoyageAIEmbeddingEncoder` class connects to the VoyageAI API to obtain embeddings for pieces of text using the VoyageAI Python client.
211211

212-
`embed_documents` will receive a list of Elements, and return an updated list which includes the `embeddings` attribute for each Element.
212+
`embed_documents` will receive a list of Elements, and return an updated list which includes the `embeddings` attribute for each Element. The encoder automatically handles batching based on token limits for optimal performance.
213213

214214
`embed_query` will receive a query as a string, and return a list of floats which is the embedding vector for the given query string.
215215

@@ -219,31 +219,103 @@ The `VoyageAIEmbeddingEncoder` class connects to the VoyageAI to obtain embeddin
219219

220220
The following code block shows an example of how to use `VoyageAIEmbeddingEncoder`. You will see the updated elements list (with the `embeddings` attribute included for each element), the embedding vector for the query string, and some metadata properties about the embedding model.
221221

222-
To use Voyage AI you will need to pass Voyage AI API Key (obtained from [https://dash.voyageai.com/](https://dash.voyageai.com/)) as the `api_key` parameter.
222+
### Configuration Parameters
223+
224+
To use Voyage AI you will need to pass the following parameters to `VoyageAIEmbeddingConfig`:
225+
226+
- **`api_key`** (required): Voyage AI API Key obtained from [https://dash.voyageai.com/](https://dash.voyageai.com/)
227+
- **`model_name`** (required): The embedding model to use. Available models include:
228+
- `voyage-3.5` - Latest general-purpose model with 1024 dimensions
229+
- `voyage-3.5-lite` - Lightweight model with 512 dimensions and higher token limits
230+
- `voyage-3-large` - Large model with enhanced performance
231+
- `voyage-context-3` - Contextualized embedding model for document-level context
232+
- `voyage-3`, `voyage-3-lite` - Previous generation models
233+
- `voyage-2`, `voyage-02` - Legacy models
234+
- Additional specialized models: `voyage-code-3`, `voyage-code-2`, `voyage-finance-2`, `voyage-law-2`, `voyage-multilingual-2`, `voyage-large-2`, `voyage-large-2-instruct`
235+
236+
For the complete list of available models, see [https://docs.voyageai.com/docs/embeddings](https://docs.voyageai.com/docs/embeddings)
237+
238+
- **`show_progress_bar`** (optional, default: `False`): Display a progress bar during batch processing
239+
- **`batch_size`** (optional): Override the default batch size for embedding requests
240+
- **`truncation`** (optional): Enable automatic truncation of inputs that exceed token limits
241+
- **`output_dimension`** (optional): Specify a custom output dimension (model-dependent)
223242

224-
The `model_name` parameter is mandatory, please check the available models at [https://docs.voyageai.com/docs/embeddings](https://docs.voyageai.com/docs/embeddings)
243+
### Basic Example
225244

226245
```python
227246
import os
228247

229248
from unstructured.documents.elements import Text
230249
from unstructured.embed.voyageai import VoyageAIEmbeddingConfig, VoyageAIEmbeddingEncoder
231250

251+
# Basic configuration with required parameters
232252
embedding_encoder = VoyageAIEmbeddingEncoder(
233253
config=VoyageAIEmbeddingConfig(
234254
api_key=os.environ["VOYAGE_API_KEY"],
235-
model_name="voyage-3"
255+
model_name="voyage-3.5"
236256
)
237257
)
258+
259+
# Embed documents
238260
elements = embedding_encoder.embed_documents(
239261
elements=[Text("This is sentence 1"), Text("This is sentence 2")],
240262
)
241263

264+
# Embed a query
242265
query = "This is the query"
243266
query_embedding = embedding_encoder.embed_query(query=query)
244267

268+
# Print results
245269
[print(e, e.embeddings) for e in elements]
246270
print(query, query_embedding)
247271
print(embedding_encoder.is_unit_vector, embedding_encoder.num_of_dimensions)
272+
```
273+
274+
### Advanced Example with Custom Options
275+
276+
```python
277+
import os
278+
279+
from unstructured.documents.elements import Text
280+
from unstructured.embed.voyageai import VoyageAIEmbeddingConfig, VoyageAIEmbeddingEncoder
281+
282+
# Advanced configuration with optional parameters
283+
embedding_encoder = VoyageAIEmbeddingEncoder(
284+
config=VoyageAIEmbeddingConfig(
285+
api_key=os.environ["VOYAGE_API_KEY"],
286+
model_name="voyage-3.5",
287+
show_progress_bar=True, # Display progress during processing
288+
truncation=True, # Automatically truncate long texts
289+
output_dimension=512 # Use reduced dimensions
290+
)
291+
)
248292

293+
# Process a larger batch of documents
294+
elements = embedding_encoder.embed_documents(
295+
elements=[Text(f"Document {i}") for i in range(100)],
296+
)
297+
```
298+
299+
### Using Contextual Embeddings
300+
301+
The `voyage-context-3` model provides contextualized embeddings that consider document-level context:
302+
303+
```python
304+
import os
305+
306+
from unstructured.documents.elements import Text
307+
from unstructured.embed.voyageai import VoyageAIEmbeddingConfig, VoyageAIEmbeddingEncoder
308+
309+
# Configure for contextual embeddings
310+
embedding_encoder = VoyageAIEmbeddingEncoder(
311+
config=VoyageAIEmbeddingConfig(
312+
api_key=os.environ["VOYAGE_API_KEY"],
313+
model_name="voyage-context-3"
314+
)
315+
)
316+
317+
# Embed documents with contextual understanding
318+
elements = embedding_encoder.embed_documents(
319+
elements=[Text("Context-aware sentence 1"), Text("Context-aware sentence 2")],
320+
)
249321
```

platform/workflows-automation.mdx

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -110,10 +110,16 @@ To create a workflow:
110110

111111
- **Anthropic**: Use Anthropic to generate embeddings. Also choose the embedding model to use, from one of the following:
112112

113-
- **voyage-2**: [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
114-
- **voyage-large-2**: [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
115-
- **voyage-code-2**: [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
116-
- **voyage-lite-02-instruct**: [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
113+
- **voyage-3.5**: Latest general-purpose model with 1024 dimensions. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
114+
- **voyage-3.5-lite**: Lightweight model with 512 dimensions and higher token limits. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
115+
- **voyage-3-large**: Enhanced performance model. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
116+
- **voyage-3**: General-purpose model (120K token limit). [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
117+
- **voyage-3-lite**: Lightweight model (120K token limit). [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
118+
- **voyage-context-3**: Contextualized embedding model for document-level context. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
119+
- **voyage-2**: Legacy general-purpose model. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
120+
- **voyage-large-2**: Legacy large model. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
121+
- **voyage-code-2**: Legacy code-specialized model. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
122+
- **voyage-lite-02-instruct**: Legacy lightweight instruction-tuned model. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
117123

118124
- **Hugging Face**: Use Hugging Face to generate embeddings. Also choose the embedding model to use, from one of the following:
119125

0 commit comments

Comments
 (0)