Skip to content

Release v2.3.0#117

Open
charles-plessy wants to merge 42 commits into
masterfrom
dev
Open

Release v2.3.0#117
charles-plessy wants to merge 42 commits into
masterfrom
dev

Conversation

@charles-plessy
Copy link
Copy Markdown
Collaborator

@charles-plessy charles-plessy commented Jun 3, 2026

Modules will be updated to new versions in release 3.0.0, together with strict syntax conversion.

v2.3.0 "Umi budou" - [June 3rd 2026]

Added

  • New --multi_cram option to produce a multi-query CRAM file combining all the alignments (#60).
  • New --multiqc_thumbs option to produce alignment thumbnails in the MultiQC report (#93).
  • New --strand option to index only one strand of the genome, which reduces memory usage at the expense of speed, and suppresses -/+ alignments (#97).
  • New --query and --queryName convenience options to skip samplesheet creation when there is only one query genome to align (#112).
  • In the GFF export format, the target genome sequence lengths are now exported in ##sequence-region fields (#70).

Fixed

  • Using the nf-core version of the FASTA_BGZIP_INDEX_DICT_SAMTOOLS subworkflow that we just contributed.
  • Check for input file existence in the parameter schema #73).

Parameters

Old parameter New parameter
--multi_cram
--multiqc_thumbs
--query
--queryName
--strand

Dependencies

Dependency Old version New version
SAMTOOLS_BGZIP 1.21
SAMTOOLS_DICT 1.21 1.23.1
SAMTOOLS_FAIDX 1.21 1.23.1
SAMTOOLS_MERGE 1.23.1
HTSLIB_BGZIPTABIX 1.23.1

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/pairgenomealign branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

charles-plessy and others added 30 commits May 25, 2026 11:11
Closes #97

To speed up alignment, both strands of the target genome are indexed.
This doubles memory usage and may produce output files containing `-/+`
alignments, which are not supported by some downstream pipelines. To
disable this behavior, the `--strand forward` option is given.
Adds a new option `--multiqc_thumb` that defines a pixel size for
alignment thumbnails to be displayed in the MultiQC report.  Defaults
to zero for no plots.

Closes #93
The option `-w` is not available on Macintosh.

Thanks @piplus2 for catching this issue.
Optional alignment thumbnails in the MultiQC report.
Closes #112

This is inspired by nf-co.re/demultiplex, which also allows to bypass
--input and provide single files directly.
Add a `--query` option for when there is only one query
The merged CRAM file is neither a pangenome nor a multiple sequence
alignment, but I find it very useful.

Temporarly CRAM files are produced but not exported.  Their header
indicates only the name of the query genomes in the read group fields.

The files are merged in a single CRAM file, where each read group
represents one genome.  Each target-query alignment is a one-to-one
relationship so a base in the target is aligned at most once to each
query.

Care is taken to ensure that the path to the reference genome is
relative to the current directory.  The multi-query CRAM file is output
in the same directory as its index and the BGZIpped genome, indexed too.

Thus the multi-query CRAM file can be loaded and visualised in the IGV.
The coverage plot shows how many query genomes align to the target at a
given location.  Expanded track view allows to visualise all the
sequence differences.  You can stabilise the order of the genomes, but
IGV enforces alphanumeric sorting.  You can work around this limitation
by prefixing the sample IDs with numbers in the sample sheet.

Custom scripts can (and have) be written to slice a pieces of the
multi-query CRAM file and turn these pieces into real MSAs…
Will change to CRAM 3.1 in pairgenomealign 3.0.0.
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
charles-plessy and others added 12 commits June 2, 2026 09:59
Co-authored-by: Joon Klaps <joon.klaps@kuleuven.be>
…which I submitted recently based on the local version.
New `--multi_cram` option to produce a multi-query CRAM file combining all the alignments
Co-authored-by: Mateus de Oliveira Lopes <lopes3137@gmail.com>
Co-authored-by: James A. Fellows Yates <jfy133@gmail.com>
@charles-plessy charles-plessy changed the title Dev Release v2.3.0 Jun 3, 2026
@charles-plessy
Copy link
Copy Markdown
Collaborator Author

Hi @muffato , as you have interest in genomics and CRAM files, I was wondering if you would be interested in reviewing this PR, where I use the new FASTA_BGZIP_INDEX_DICT_SAMTOOLS subworkflow and do many more exciting things such as representing pairwise genome alignments of multiple queries to a reference in a single CRAM file!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant