Skip to content

new module: stitchr#10812

Open
Clara0611 wants to merge 7 commits intonf-core:masterfrom
Clara0611:stitchr
Open

new module: stitchr#10812
Clara0611 wants to merge 7 commits intonf-core:masterfrom
Clara0611:stitchr

Conversation

@Clara0611
Copy link

@Clara0611 Clara0611 commented Mar 13, 2026

New module: stitchr

Stitchr is a tool that generates full nucleotide and amino acid TCR sequences from V-, J- and CDR3 information.

Testdata is still missing, waiting for PR

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR. -> Waiting for testdata PR
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions - See version_topics
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements. -> Not possible, added Dockerfile
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker

@Clara0611 Clara0611 marked this pull request as draft March 13, 2026 10:18
@Clara0611
Copy link
Author

Draft PR - still waiting for testdata merge (here

@Clara0611 Clara0611 marked this pull request as ready for review March 13, 2026 10:57
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like these are just python libraries, no need for a dockerfile then. Easiest to use seqera containers instead https://nf-co.re/docs/tutorials/nf-core_components/using_seqera_containers (also please add an environment.ymlfile to also support conda)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thank you for taking the time to look at my module!
Unfortunately, Stitchr is not available via conda (bioconda, conda-forge), only via pip. I left the environment.yml file out as well, because the instructions for adding a module (https://nf-co.re/docs/guidelines/components/modules, section 7.2) specify that this file is only required if the tool is available via conda. Of course I would be happy to add the (empty) file back in, if that is the preferred style.
The reason for including our own container is that stitchr requires a data download (and further manipulation of this data), handled by an internal wrapper script. This script requires root permission, which is not available in the container (with the given global configs). I've tried to circumvent this issue in several ways, e.g. with a local config, but couldn't find a way to make it work. The external container already has the necessary data baked in. However, if you have an immediate idea on something I might try, I'd be very open to that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the first part: you can install pip-only dependencies via the environment.yml see for example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the second part: why not make the download command a separate module then?

Copy link
Author

@Clara0611 Clara0611 Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the example! I've added conda just now, but without the container, the species data is missing, so the module cannot run with conda in its current state. Sorry, I forgot that this was an additional concern with conda when I first replied.

About the separate module: We discussed this locally, and decided against it because the download wrapper script (stitchrdl) is so closely intertwined with the rest of the tool, and not really something that would be run on its own. Also, we fear that this might simply move the problem: With two modules, we would first need to run stitchrdl inside a download module (which doesn't work due to permission issues - there is no option to determine where data is saved, stitchrdl deposits the data were stitchr expects to find it), and then move the data to the correct place again, this time manually, in the second module (here, moving over root could perhaps be avoided). Maybe there's something I'm missing though, as this is my first contribution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jumping in at the request of @mashehu !

In metagenomics modules, we've had a lot of cases of tools coming with databases etc.. In the vast majority of cases (actually, all, from the the top of my head), we've ended up splitting data from the execution.

This is partly because the databases are very large - and you don't want to repeatedly pull large containers on each execution node (as this can be very slow), and also embedding data inside the container means you then cannot update the data.

Can you maybe go into more detail what the technical issue is (for someone who is not familiar with the tool/the conversation 😅 )?

I don't understand what you mean by 'due to permission issues', for example. Could you just mount to the container where strichr wants thte data to go the work directory for downloading, for example?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for taking the time :)

That makes sense, I'll split the download off into another submodule.

On the topic of permission errors: The issue is that stitchr provides a wrapper script for data downloading, which is not flexible in terms of the destination folder. This is not a problem when running the script with conda, outside of a container. However, when running the script inside of a container, it produces a permission error when it tries to move the downloaded data to the appropriate folder. From what I understand, this is because it tries to move the data into a folder that is immutable in a container setting (/opt/...)? I'm fairly new to containers, and couldn't find a way to circumvent this - if you know of one, I'm very happy to try!

I just reproduced the error by running stitchrdl with a seqera container and profile docker:
"
OSError: [Errno 18] Invalid cross-device link: 'HUMAN' -> '/opt/conda/lib/python3.14/site-packages/Data/HUMAN'

During handling of the above exception, another exception occurred:

PermissionError: [Errno 13] Permission denied: '/opt/conda/lib/python3.14/site-packages/Data/HUMAN'

"
It would probably not be allowed to create modules that only support conda, and don't provide a container at all?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants