Skip to content

Conversation

@ljwharbers
Copy link
Contributor

@ljwharbers ljwharbers commented Aug 14, 2025

The following key changes have been made to enable both cDNA and DNA sample processing with the same pipeline.

Modules added:

  • The following modules have been added flexiplex/discovery, flexiplex/filter, flexiplex/assign and flexiformatter. The flexiplex modules are used to extract barcodes, filter and assign barcodes. flexiformatter is used to move the barcodes from the readname to the bam tags.

Subworkflow added:

  • The demultiplexing has been moved to two subworkflows. demultiplex_blaze.nf and demultiplex_flexiplex.nf where the demultiplexing steps are executed.
  • One subworkflow has been added: align_deduplicate_dna.nf which performs minimap2 alignment into picard MarkDuplicates and BAM_SORT_STATS_SAMTOOLS.

Input changes:

  • Samplesheet now accepts an additional column type which can be either dna or cdna. It is an optional column and defaults to cdna to maintain compatibility with older samplesheets.

Config changes:

  • The whitelist parameter is removed and replaced by two new parameters whitelist_dna and whitelist_cdna to specifiy a DNA and cDNA whitelist, respectively.
  • Included whitelists in assets/ are now .gz instead of .zip to easier compatibility with blaze/flexiplex with one simple gunzip module if the file ends with .gz.
  • New config parameter: demux_tool to specify whether to use flexiplex or blaze for cDNA samples demultiplexing.

QC changes:

  • The script used by READ_COUNTS() is edited to be able to use flexiplex and blaze output as input.

TODO:
Update readmes and ...?

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/scnanoseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

atrull314 and others added 30 commits October 7, 2024 09:21
Performing merge for first release
…ode_formats. Also change dlogic for whitelist gunzipping

The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

TODO: Should here be added which output is cDNA/DNA specific?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that makes sense, to put CDNA-ONLY, DNA-ONLY, or SHARED, or something to that effect in the sections and maybe put a blurb at the top that there are notes about which steps are type-specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, maybe easiest to indeed just have * = cDNA specific output, ** = DNA specific output or something. Will add something coming week and tag you once I finish the output.md

docs/output.md Outdated
The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:

- [Preprocessing](#preprocessing)
- [Nanofilt](#nanofilt) - Read Quality Filtering and Trimming
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to get rid of the Nanofilt section and replace it with Chopper

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this, is the bit under output files correct anyways? We do not publish any of these files so should we include this? Will also need to adjust some other things in the output.md, will put it on my TODO.

docs/usage.md Outdated
Please note that while the above command specifies both transcriptome and genome fasta files, only one is needed for the pipeline and is dependent on which quantifier you wish to use. Furthermore, if you have any DNA samples, the `genome_fasta` is required.
Additionally, for the `quantifier` parameter in the above command, we've listed the quantifiers as a comma-delimited string. It is possible to only use one quantifier, and can be accomplished by just providing the name of the quantifying tool you wish to run as a single value, i.e. providing `oarfish` if you only wish to run `oarfish`.

The pipeline supports barcode identification and extraction through both `flexiplex` and `blaze` and can be set through `demux_tool` parameter. The barcode format can be specified through the `barcode_format` parameter. When working with completely custom barcode structures, you can additionally specify these with `custom_flexiplex_barcode_dna` and `custom_flexiplex_barcode_cdna` parameters. Note: ensure that you are using `flexiplex` as the barcode calling tool. This can be a string formatted as follows `"-x CTACACGACGCTCTTCCGATCT -b ???????????????? -u ?????????? -x TTTCTTATATGGG -f 8 -e 2"`, for more information check the documentation: https://davidsongroup.github.io/flexiplex/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to put a couple sentences to more explicitly state that dna sample_types only support flexiplex as a barcode caller.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acutally related, is this because BLAZE can't be used on the dna sample types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True good call. And indeed BLAZE does not support DNA, it only supports the (most commonly) used 10x barcodes basically. Flexiplex will support anything I've encountered so far with.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
Copy link
Collaborator

@atrull314 atrull314 Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you've added your flexiformatter tool which is a public github repo, might be worth adding the link to the repo here. We've got a couple tools like that ( e.g. pigz) that don't have a true citation to add here

nextflow.config Outdated
barcode_format = null
custom_flexiplex_barcode_dna = null
custom_flexiplex_barcode_cdna = null
demux_tool = 'flexiplex' // Options: flexiplex, blaze
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there's only one option for dna, but may be good to go ahead and implement a demux_tool_cdna and demux_tool_dna if there are going to be different subsets for each type. But this is dependent on if BLAZE can run on teh dna samples or not

include { MINIMAP2_ALIGN } from '../../modules/nf-core/minimap2/align'
include { PICARD_MARKDUPLICATES } from '../../modules/nf-core/picard/markduplicates'
include { BAM_SORT_STATS_SAMTOOLS } from '../../subworkflows/nf-core/bam_sort_stats_samtools'
include { FLEXIFORMATTER } from '../../modules/local/flexiformatter'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this import needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, import is indeed needed but I was missing the actual flexiformatter process (this will move barcode and umi tags from read name to the tags). Will adjust, thanks!

ch_flexiplex_barcodes = Channel.empty()

// Unzip the whitelist if needed
if (whitelist.extension == "gz"){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does flexiplex work with .zip files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope it does not work with .zip files. Additionally, Blaze does also work with .gzip and I am sure that .gzip is more commonly used for these files than.zip. I also changed the input test files to match this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I was thinking of adding a central demultiplex workflow that calls out the demux_flexiplex or demux_blaze based on the demux_tool because there's a decent bit of shared code between the two -- so wanted to check on that

@atrull314
Copy link
Collaborator

Hey @ljwharbers,

I've been running the pipeline using the test profile. Currently got it working using BLAZE as the barcode caller, however, using flexiplex causes an error at the flexiformatter step (this is also the error with the automated tests). It looks like it may be caused by the print statements in the flexiformatter code (at lines 37 and 42), so may be worth outputting that information directly to a log or to stderr?

Here's the error for reference:

Command error:
  /usr/local/lib/python3.13/site-packages/simplesam.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
    from pkg_resources import get_distribution
  [W::sam_read1_sam] Parse error at line 11
  samtools sort: truncated file. Aborting

@ljwharbers
Copy link
Contributor Author

Oops looks like that one indeed slipped through the cracks when I updated flexiformatter to work with barcode only (no umi) reads. Will fix and do the testing. I'll also work on it today and tomorrow so should be able to resolve some/most of the outstanding work still.

@atrull314
Copy link
Collaborator

I know we talked in slack, but just to put it here as well.

One of the other items we'll need is test data for the DNA portions of the pipeline (and updates to the test.configs or perhaps even new test configs/profiles), with the downsampled and normal FASTQs going here: https://github.com/nf-core/test-datasets/tree/scnanoseq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants