Open
Description
Describe the bug
In Pipeline -> EmbedderConfig, every embedding model documented here
https://docs.unstructured.io/open-source/core-functionality/embedding#voyageaiembeddingencoder
is supported except for Voyage throws an error as being not recognized
To Reproduce
from unstructured.ingest.v2.pipeline.pipeline import Pipeline
from unstructured.ingest.v2.interfaces import ProcessorConfig
from unstructured.ingest.v2.processes.connectors.fsspec.s3 import (
S3IndexerConfig,
S3DownloaderConfig,
S3ConnectionConfig,
S3AccessConfig,
S3UploaderConfig
)
from unstructured.ingest.v2.processes.partitioner import PartitionerConfig
from unstructured.ingest.v2.processes.chunker import ChunkerConfig
from unstructured.ingest.v2.processes.embedder import EmbedderConfig
pipeline = Pipeline.from_configs(
context=ProcessorConfig(),
indexer_config=S3IndexerConfig(remote_url=INPUT_S3_FILE),
downloader_config=S3DownloaderConfig(download_dir="s3-ingest-download"),
source_connection_config=S3ConnectionConfig(
access_config=S3AccessConfig(
key="AWS_ACCESS_KEY_ID",
secret="AWS_SECRET_ACCESS_KEY",
token="AWS_SESSION_TOKEN"
)
),
partitioner_config=PartitionerConfig(
partition_by_api=True,
api_key="UNSTRUCTURED_API_KEY_AUTH",
partition_endpoint="UNSTRUCTURED_SERVER_URL",
strategy="auto"
),
chunker_config=ChunkerConfig(chunking_strategy="by_title",
chunk_combine_text_under_n_chars=100,
chunk_include_orig_elements=False,
chunk_max_characters=4000),
embedder_config=EmbedderConfig(embedding_provider="Voyage",
embedding_api_key="VOYAGE_API_KEY",
embedding_model_name="voyage-law-2"),
destination_connection_config=S3ConnectionConfig(
access_config=S3AccessConfig(
key="AWS_ACCESS_KEY_ID",
secret="AWS_SECRET_ACCESS_KEY",
token="AWS_SESSION_TOKEN"
)
),
uploader_config=S3UploaderConfig(remote_url=OUTPUT_S3_FILEPATH)
)
Expected behavior
Support for VoyageAIEmbeddingEncoder / Voyage to be a valid parameter
If support is not intended, there should be indication in the documentation that this is available functionality only when ran outside the pipeline
Screenshots
If applicable, add screenshots to help explain your problem.
Environment Info
Python 3.11
ValueError: Voyage not a recognized encoder
Additional context
Add any other context about the problem here.