Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError when using Advanced RAG #8249

Open
liviaj29 opened this issue Aug 19, 2024 · 2 comments
Open

TypeError when using Advanced RAG #8249

liviaj29 opened this issue Aug 19, 2024 · 2 comments
Labels
community-triage P1 High priority, add to the next sprint

Comments

@liviaj29
Copy link

Describe the bug
When running the code provided in the documentation and blog for HyDE (Advanced RAG), there is a type error that originates from running the Hypothetical Document Embedder pipeline connecting the adapter to the embedder. The OutputAdapter returns the list of documents as a string, but SentenceTransformersDocumentEmbedder() expects them as a list of documents.

The output type defined in OutputAdapter is incorrect as it specifies List[Document] but the type is actually a string.

Error message
TypeError: SentenceTransformersDocumentEmbedder expects a list of Documents as input.In case you want to embed a list of strings, please use the SentenceTransformersTextEmbedder.

Expected behavior
The pipeline to embed the documents with the Hypothetical Document Embedder should run without error, and generate the hypothetical embeddings.

Additional context
The error occurs when the code is copied from the tutorials directly. Also when the code has been swapped out to use an Ollama Generator and local PDFs as the data.

The error can be fixed by using the .to_dict() method in the custom_filters on each Document, then in the SentenceTranformersDocumentEmbedder() using the .from_dict() method. I would be happy to create a pull request with this change.

To Reproduce
Copy and run the code from either of these tutorials: https://docs.haystack.deepset.ai/docs/hypothetical-document-embeddings-hyde and https://haystack.deepset.ai/blog/optimizing-retrieval-with-hyde

FAQ Check

System:

  • OS: ubuntu
  • GPU/CPU: nvidia GeForce RTX 3070
  • Haystack version (commit or version number): 2.3.1
  • DocumentStore: ChromaDocumentStore/InMemory
  • Reader: N/A
  • Retriever: ChromaEmbeddingRetriever/InMemory
@anakin87
Copy link
Member

anakin87 commented Aug 19, 2024

Related to #8176 and #8161. Should be fixed in the upcoming 2.5.0 release.

@julian-risch julian-risch added the P2 Medium priority, add to the next sprint if no P1 available label Aug 26, 2024
@julian-risch julian-risch added P1 High priority, add to the next sprint and removed P2 Medium priority, add to the next sprint if no P1 available labels Sep 7, 2024
@julian-risch
Copy link
Member

Haystack 2.5.0 release is out: https://github.com/deepset-ai/haystack/releases/tag/v2.5.0 so we can follow up here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-triage P1 High priority, add to the next sprint
Projects
None yet
Development

No branches or pull requests

3 participants