Conversation
Co-authored-by: Anup Kumar, PhD <anup.rulez@gmail.com>
Co-authored-by: Anup Kumar, PhD <anup.rulez@gmail.com>
|
|
tools/rag/download_embeddings.py
Outdated
|
|
||
| from sentence_transformers import SentenceTransformer | ||
|
|
||
| models = [ |
There was a problem hiding this comment.
Can we take these models as inputs to the tool? Otherwise we are stuck with these models and if users want, they cannot use their favorite embedding models. Now its really easy to accept these models. Entire hugging face in available inside Galaxy which can be directly imported from file uploader to any history and then to the tool. Maybe in future, a better embedding model becomes available.
There was a problem hiding this comment.
I’m not sure if I fully understand your suggestion.
Do you mean that the tool should accept any Hugging Face embedding model as a dataset input (e.g., uploaded or provided within a Galaxy history)?
If so, is there already an existing Galaxy tool that can download a Hugging Face model and make it available as a dataset, or would this require adding a separate tool for that purpose?
There was a problem hiding this comment.
what was wrong with the huggingface loc file?
FOR CONTRIBUTOR:
Description
This PR adds a new Galaxy tool: RAG Retriever.
The tool performs document retrieval for Retrieval-Augmented Generation (RAG) workflows using LlamaIndex and HuggingFace sentence-transformers embeddings. It extracts the most relevant text chunks from input documents based on semantic similarity to a user query.
The output is a context file that can be used as input for downstream LLM tools.
Example Workflow
This tool is intended to be used together with the LLM Hub tool.
Typical workflow:
.tar.gz/.tgz) containing a HuggingFace sentence-transformers modelrag_context.txtfile to the LLM Hub toolPipeline overview:
Documents + Embedding Model Archive + Question → RAG Retriever → Context → LLM Hub → Answer
Example Galaxy workflow combining RAG Retriever and LLM Hub:
Notes
.tar.gz/.tgzarchive input.