You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: api-reference/how-to/embedding.mdx
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -68,7 +68,7 @@ To use the Ingest CLI or Ingest Python library to generate embeddings, do the fo
68
68
-`langchain-huggingface`. [Choose a model](https://huggingface.co/models?other=embeddings), or use the default model [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2).
69
69
-`langchain-openai`. [Choose a model](https://platform.openai.com/docs/guides/embeddings/embedding-models), or use the default model `text-embedding-ada-002`.
70
70
-`langchain-vertexai`. [Choose a model](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api), or use the default model `textembedding-gecko@001`.
71
-
-`langchain-voyageai`. [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided.
71
+
-`langchain-voyageai`. [Choose a model](https://docs.voyageai.com/docs/embeddings). No default model is provided. Recommended models include `voyage-3.5` (latest general-purpose), `voyage-3.5-lite` (lightweight with higher token limits), `voyage-3-large` (enhanced performance), and `voyage-context-3` (for contextualized embeddings).
72
72
-`mixedbread-ai`. [Choose a model](https://www.mixedbread.ai/docs/embeddings/models), or use the default model [mixedbread-ai/mxbai-embed-large-v1](https://www.mixedbread.ai/docs/embeddings/mxbai-embed-large-v1).
73
73
-`octoai`. [Choose a model](https://octo.ai/blog/supercharge-rag-performance-using-octoai-and-unstructured-embeddings/), or use the default model `thenlper/gte-large`.
The `VoyageAIEmbeddingEncoder` class connects to the VoyageAI to obtain embeddings for pieces of text.
210
+
The `VoyageAIEmbeddingEncoder` class connects to the VoyageAI API to obtain embeddings for pieces of text using the VoyageAI Python client.
211
211
212
-
`embed_documents` will receive a list of Elements, and return an updated list which includes the `embeddings` attribute for each Element.
212
+
`embed_documents` will receive a list of Elements, and return an updated list which includes the `embeddings` attribute for each Element. The encoder automatically handles batching based on token limits for optimal performance.
213
213
214
214
`embed_query` will receive a query as a string, and return a list of floats which is the embedding vector for the given query string.
215
215
@@ -219,31 +219,103 @@ The `VoyageAIEmbeddingEncoder` class connects to the VoyageAI to obtain embeddin
219
219
220
220
The following code block shows an example of how to use `VoyageAIEmbeddingEncoder`. You will see the updated elements list (with the `embeddings` attribute included for each element), the embedding vector for the query string, and some metadata properties about the embedding model.
221
221
222
-
To use Voyage AI you will need to pass Voyage AI API Key (obtained from [https://dash.voyageai.com/](https://dash.voyageai.com/)) as the `api_key` parameter.
222
+
### Configuration Parameters
223
+
224
+
To use Voyage AI you will need to pass the following parameters to `VoyageAIEmbeddingConfig`:
225
+
226
+
-**`api_key`** (required): Voyage AI API Key obtained from [https://dash.voyageai.com/](https://dash.voyageai.com/)
227
+
-**`model_name`** (required): The embedding model to use. Available models include:
228
+
-`voyage-3.5` - Latest general-purpose model with 1024 dimensions
229
+
-`voyage-3.5-lite` - Lightweight model with 512 dimensions and higher token limits
230
+
-`voyage-3-large` - Large model with enhanced performance
231
+
-`voyage-context-3` - Contextualized embedding model for document-level context
For the complete list of available models, see [https://docs.voyageai.com/docs/embeddings](https://docs.voyageai.com/docs/embeddings)
237
+
238
+
-**`show_progress_bar`** (optional, default: `False`): Display a progress bar during batch processing
239
+
-**`batch_size`** (optional): Override the default batch size for embedding requests
240
+
-**`truncation`** (optional): Enable automatic truncation of inputs that exceed token limits
241
+
-**`output_dimension`** (optional): Specify a custom output dimension (model-dependent)
223
242
224
-
The `model_name` parameter is mandatory, please check the available models at [https://docs.voyageai.com/docs/embeddings](https://docs.voyageai.com/docs/embeddings)
243
+
### Basic Example
225
244
226
245
```python
227
246
import os
228
247
229
248
from unstructured.documents.elements import Text
230
249
from unstructured.embed.voyageai import VoyageAIEmbeddingConfig, VoyageAIEmbeddingEncoder
231
250
251
+
# Basic configuration with required parameters
232
252
embedding_encoder = VoyageAIEmbeddingEncoder(
233
253
config=VoyageAIEmbeddingConfig(
234
254
api_key=os.environ["VOYAGE_API_KEY"],
235
-
model_name="voyage-3"
255
+
model_name="voyage-3.5"
236
256
)
237
257
)
258
+
259
+
# Embed documents
238
260
elements = embedding_encoder.embed_documents(
239
261
elements=[Text("This is sentence 1"), Text("This is sentence 2")],
-**voyage-3.5**: Latest general-purpose model with 1024 dimensions. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
114
+
-**voyage-3.5-lite**: Lightweight model with 512 dimensions and higher token limits. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
-**voyage-3**: General-purpose model (120K token limit). [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
117
+
-**voyage-3-lite**: Lightweight model (120K token limit). [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
118
+
-**voyage-context-3**: Contextualized embedding model for document-level context. [Learn more](https://docs.anthropic.com/en/docs/build-with-claude/embeddings#available-voyage-models).
0 commit comments