Rag tool by LucaBEMan · Pull Request #1818 · bgruening/galaxytools

LucaBEMan · 2026-03-19T13:47:00Z

FOR CONTRIBUTOR:

I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
License permits unrestricted use (educational + commercial)
This PR adds a new tool or tool collection
This PR updates an existing tool or tool collection
This PR does something else (explain below)

Description

This PR adds a new Galaxy tool: RAG Retriever.

The tool performs document retrieval for Retrieval-Augmented Generation (RAG) workflows using LlamaIndex and HuggingFace sentence-transformers embeddings. It extracts the most relevant text chunks from input documents based on semantic similarity to a user query.

The output is a context file that can be used as input for downstream LLM tools.

Example Workflow

This tool is intended to be used together with the LLM Hub tool.

Typical workflow:

Upload one or more documents (PDF, JSON, TXT, CSV, Markdown, etc.)
Upload an embedding model archive (.tar.gz / .tgz) containing a HuggingFace sentence-transformers model
Run RAG Retriever to extract relevant context chunks
Pass the generated rag_context.txt file to the LLM Hub tool
Ask a question using the retrieved context

Pipeline overview:

Documents + Embedding Model Archive + Question → RAG Retriever → Context → LLM Hub → Answer

Example Galaxy workflow combining RAG Retriever and LLM Hub:

Notes

The tool does not generate answers itself, but prepares high-quality context for LLMs.
Supports multiple input formats including PDF, JSON, TXT, CSV, and Markdown.
Accepts a HuggingFace sentence-transformers embedding model as a .tar.gz / .tgz archive input.
The archive-based model input preserves the original model directory structure required for loading custom embedding models.

tools/rag/.shed.yml

tools/rag/rag_retriever.xml

Co-authored-by: Anup Kumar, PhD <anup.rulez@gmail.com>

anuprulez · 2026-03-20T10:21:05Z

planemo lint on the tool's XML throws following error:

.. ERROR (XSD): Invalid XML: Element 'param', attribute 'separator': The attribute 'separator' is not allowed.
.. CHECK (TestsNoValid): 1 test(s) found.
.. INFO (OutputsNumber): 1 outputs found.
.. INFO (InputsNum): Found 4 input parameters.
.. WARNING (HelpInvalidRST): Invalid reStructuredText found in help - [<string>:59: (WARNING/2) Title underline too short.

How it works
.....
].
.. CHECK (HelpPresent): Tool contains help section.
.. CHECK (ToolIDValid): Tool defines an id [rag_retriever].
.. CHECK (ToolNameValid): Tool defines a name [RAG Retriever].
.. CHECK (ToolProfileValid): Tool specifies profile version [24.2].
.. CHECK (ToolVersionValid): Tool defines a version [1.0.0].
.. ERROR (ValidDatatypes): Unknown datatype [jsonl] used in param element
.. ERROR (ValidDatatypes): Unknown datatype [md] used in param element
.. INFO (CommandInfo): Tool contains a command.
.. WARNING (CitationsMissing): No citations found, consider adding citations to your tool.
Failed linting

anuprulez · 2026-03-20T10:32:39Z

tools/rag/download_embeddings.py

+
+from sentence_transformers import SentenceTransformer
+
+models = [


Can we take these models as inputs to the tool? Otherwise we are stuck with these models and if users want, they cannot use their favorite embedding models. Now its really easy to accept these models. Entire hugging face in available inside Galaxy which can be directly imported from file uploader to any history and then to the tool. Maybe in future, a better embedding model becomes available.

I’m not sure if I fully understand your suggestion.
Do you mean that the tool should accept any Hugging Face embedding model as a dataset input (e.g., uploaded or provided within a Galaxy history)?
If so, is there already an existing Galaxy tool that can download a Hugging Face model and make it available as a dataset, or would this require adding a separate tool for that purpose?

…h loc file

bgruening · 2026-03-20T21:50:19Z

tools/rag/tool-data/huggingface.loc.sample.loc.sample

what was wrong with the huggingface loc file?

LucaBEMan and others added 4 commits March 18, 2026 16:20

add rag tool

3ebf316

formatting

cbc6efd

Merge branch 'master' into rag-tool

cd0b86f

rename shed.yml

da5b0a4

anuprulez reviewed Mar 19, 2026

View reviewed changes

tools/rag/.shed.yml Outdated Show resolved Hide resolved

anuprulez reviewed Mar 19, 2026

View reviewed changes

tools/rag/rag_retriever.xml Outdated Show resolved Hide resolved

LucaBEMan and others added 5 commits March 19, 2026 16:04

Update tools/rag/rag_retriever.xml

59c8403

Co-authored-by: Anup Kumar, PhD <anup.rulez@gmail.com>

Update tools/rag/.shed.yml

6091e09

Co-authored-by: Anup Kumar, PhD <anup.rulez@gmail.com>

Use local HuggingFace models via data table and add download script

0583885

fix linting error

78bab49

fix linting error again

eb9abe2

anuprulez reviewed Mar 20, 2026

View reviewed changes

embedding model as .tgz inpu and delete embedding model selection wit…

4f9e8b3

…h loc file

bgruening reviewed Mar 20, 2026

View reviewed changes

tools/rag/tool-data/huggingface.loc.sample.loc.sample Outdated

Copy link

Owner

bgruening Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what was wrong with the huggingface loc file?

LucaBEMan added 3 commits March 20, 2026 23:01

fix lint error

fc16627

add citation

b3f62f3

Fix planemo lint warnings: RST headers, citations, shed.yml categories

fcc6c8e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rag tool#1818

Rag tool#1818
LucaBEMan wants to merge 13 commits intobgruening:masterfrom
LucaBEMan:rag-tool

LucaBEMan commented Mar 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

anuprulez commented Mar 20, 2026

Uh oh!

anuprulez Mar 20, 2026

Uh oh!

LucaBEMan Mar 20, 2026

Uh oh!

bgruening Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		from sentence_transformers import SentenceTransformer

		models = [

Conversation

LucaBEMan commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Example Workflow

Notes

Uh oh!

Uh oh!

Uh oh!

anuprulez commented Mar 20, 2026

Uh oh!

anuprulez Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

LucaBEMan Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

bgruening Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LucaBEMan commented Mar 19, 2026 •

edited

Loading