Logan Proteins minor fix by SaimMomin12 · Pull Request #1860 · bgruening/galaxytools

SaimMomin12 · 2026-05-06T07:18:39Z

FOR CONTRIBUTOR:

I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
License permits unrestricted use (educational + commercial)
This PR adds a new tool or tool collection
This PR updates an existing tool or tool collection
This PR does something else (explain below)

There are two labels that allow to ignore specific (false positive) tool linter errors:

skip-version-check: Use it if only a subset of the tools has been updated in a suite.
skip-url-check: Use it if github CI sees 403 errors, but the URLs work.

…n search_database.xml

bgruening

Thanks for the fix! The three changes here are correct and well-scoped:

.shed.yml URL typo (search_proteins → logan_proteins) ✅
@VERSION_SUFFIX@ bump ✅
FASTA-extension symlink in search_database.xml ✅

While reviewing I went through the rest of the wrapper and found a few things worth folding in (or addressing in a quick follow-up). Requesting changes mostly because of #1 and #2 below — they're closely related to the fix in this PR and the same class of bug.

Most likely needs to land here

embed_query.xml very likely has the same FASTA-extension bug this PR fixes for search_database.xml.
At embed_query.xml line 14–15, --query_sequences '$query_sequences' is passed the raw Galaxy .dat path. If embed_query.py infers FASTA from the suffix (which the sibling fix strongly implies), embedding will fail on real datasets the same way search_database.py did. Suggest applying the same ln -s '$query_sequences' query_sequences.fasta pattern here.
embed_query.xml help text describes a UI toggle that doesn't exist.
Line 55 says "Force CPU usage: By default, CPU will be used. Disable this option to use GPU instead if available." — but the only mechanism is the GALAXY_LOGAN_PROTEIN_SEARCH_FORCE_CPU env var read at lines 8–12. There's no <param> exposing this. Either add a boolean param, or reword the help so users know this is an admin/job-destination setting.

Worth fixing while you're in here (cheap)

tool-data/faiss_database.loc.sample header is boilerplate from another tool — it claims the file is for "metagenomics files." Replace with text that actually describes the FAISS protein database loc format. Also align the example row with the canonical 4-column format used in test-data/faiss_database.loc (faiss-demo-db-20260203\tFAISS Test Database\t1.2.0\t/path); the current sample row uses a different value style.
test-data/queries.fasta contains a nucleotide sequence labeled test_protein_2. Line 4 starts ATGGCTAGCAAAGGAGAAGAA… (looks like GFP CDS). It's labeled as a protein but isn't one. Either rename the record to indicate it's a nucleotide test case, or replace it with a real protein sequence — otherwise it's a copy-paste footgun for users using this as a reference.
outfmt param in search_database.xml is marked optional="true" but always has a default. Lines 31–38 set value="0" selected="true", so the #if $outfmt: guard at line 17 is effectively dead code — 0 is always passed. Either drop optional="true" or remove the default if "unset" should mean "use upstream default."
-F flag in embed_query.xml line 17 is undocumented. A short trailing comment ( or whatever it actually means) would help future maintainers.

Follow-up PR material (don't block this one)

Test coverage is thin. The search_database test at lines 47–57 only asserts "No results, exiting" because the bundled FAISS DB at test-data/test-db/faiss/db.txt is a 0-byte stub. So we currently verify the script bails gracefully on an empty index — not that the FAISS+MMseqs2 pipeline actually returns matches. A small real index that produces ≥1 hit would substantially raise the safety floor; today, a regression in the actual search logic could land green.
macros.xml citations are thin. Only the upstream search_protein GitHub repo is cited. Please add at minimum:
- FAISS — Johnson et al., 10.1109/TBDATA.2019.2921572
- MMseqs2 — Steinegger & Söding, 10.1038/nbt.3988
- The GLM2 / gLM2 model citation (whichever paper the container actually uses)
Docker-only requirements. macros.xml lines 5–9 declare only quay.io/bgruening/logan-protein:1.2.0 — no <requirement type="package">. Galaxy instances without Docker support can't install this tool. Likely unavoidable given the GLM2 + torch + FAISS + MMseqs2 stack, but worth a one-liner in .shed.yml / help so admins know what to expect.
search_database.xml help mentions intermediate files that aren't exposed as outputs (lines 94–99: query_results.tsv, unique_centroids.fasta, matches.top_hit, …). Either expose the useful ones as <data> outputs, or trim the help so users aren't told about files they can't access.
Suite naming inconsistency — directory is logan_proteins (plural), suite is suite_logan_protein (singular), tool IDs use logan_protein_* (singular). Tool IDs are stable so don't change them; just flagging for new files.

Happy to approve once #1 and #2 are addressed (or with a quick justification that #1 isn't actually a problem). Everything else can be its own PR.

Update version suffix in macros.xml and adjust query_sequences path i…

37120a3

…n search_database.xml

SaimMomin12 requested a review from bgruening May 6, 2026 07:18

Fix remote repository URL in logan_proteins .shed.yml

cc5eec9

bgruening requested changes May 6, 2026

View reviewed changes

SaimMomin12 and others added 2 commits May 7, 2026 13:56

Update logan_proteins tool version to 1.2.2

be8377b

Merge branch 'master' into tools/fix-search-proteins

bec89de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logan Proteins minor fix#1860

Logan Proteins minor fix#1860
SaimMomin12 wants to merge 4 commits intobgruening:masterfrom
SaimMomin12:tools/fix-search-proteins

SaimMomin12 commented May 6, 2026

Uh oh!

bgruening left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SaimMomin12 commented May 6, 2026

Uh oh!

bgruening left a comment

Choose a reason for hiding this comment

Most likely needs to land here

Worth fixing while you're in here (cheap)

Follow-up PR material (don't block this one)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants