You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When running the code provided in the documentation and blog for HyDE (Advanced RAG), there is a type error that originates from running the Hypothetical Document Embedder pipeline connecting the adapter to the embedder. The OutputAdapter returns the list of documents as a string, but SentenceTransformersDocumentEmbedder() expects them as a list of documents.
The output type defined in OutputAdapter is incorrect as it specifies List[Document] but the type is actually a string.
Error message
TypeError: SentenceTransformersDocumentEmbedder expects a list of Documents as input.In case you want to embed a list of strings, please use the SentenceTransformersTextEmbedder.
Expected behavior
The pipeline to embed the documents with the Hypothetical Document Embedder should run without error, and generate the hypothetical embeddings.
Additional context
The error occurs when the code is copied from the tutorials directly. Also when the code has been swapped out to use an Ollama Generator and local PDFs as the data.
The error can be fixed by using the .to_dict() method in the custom_filters on each Document, then in the SentenceTranformersDocumentEmbedder() using the .from_dict() method. I would be happy to create a pull request with this change.
julian-risch
added
P1
High priority, add to the next sprint
and removed
P2
Medium priority, add to the next sprint if no P1 available
labels
Sep 7, 2024
Describe the bug
When running the code provided in the documentation and blog for HyDE (Advanced RAG), there is a type error that originates from running the Hypothetical Document Embedder pipeline connecting the adapter to the embedder. The OutputAdapter returns the list of documents as a string, but SentenceTransformersDocumentEmbedder() expects them as a list of documents.
The output type defined in OutputAdapter is incorrect as it specifies
List[Document]
but the type is actually a string.Error message
TypeError: SentenceTransformersDocumentEmbedder expects a list of Documents as input.In case you want to embed a list of strings, please use the SentenceTransformersTextEmbedder.
Expected behavior
The pipeline to embed the documents with the Hypothetical Document Embedder should run without error, and generate the hypothetical embeddings.
Additional context
The error occurs when the code is copied from the tutorials directly. Also when the code has been swapped out to use an Ollama Generator and local PDFs as the data.
The error can be fixed by using the
.to_dict()
method in thecustom_filters
on each Document, then in the SentenceTranformersDocumentEmbedder() using the.from_dict()
method. I would be happy to create a pull request with this change.To Reproduce
Copy and run the code from either of these tutorials: https://docs.haystack.deepset.ai/docs/hypothetical-document-embeddings-hyde and https://haystack.deepset.ai/blog/optimizing-retrieval-with-hyde
FAQ Check
System:
The text was updated successfully, but these errors were encountered: