Skip to content

Many small fixes for milestone 2.2.0#69

Merged
charles-plessy merged 9 commits into
devfrom
milestone_2.2.0
May 25, 2025
Merged

Many small fixes for milestone 2.2.0#69
charles-plessy merged 9 commits into
devfrom
milestone_2.2.0

Conversation

@charles-plessy

Copy link
Copy Markdown
Collaborator

Here is a second batch of commits for milestone 2.2.0. They mainly correspond to low-hanging fruit fixes, here is a copy of the new entries in the changelog.

  • SAM/BAM/CRAM alignments files are sorted and their header features all sequences of the target genome.
  • Report ungapped percent identity (#46).
  • Update full-size test genomes to feature more T2T assemblies (#59).
  • Restore BED format support (#56).
  • Document the multiqc_train.txt and multiqc_last_o2o.txt aggregating alignment statistics (#52).
  • Point the test configs samplesheets to nf-core/test-datasets in order to run the AWS full tests (#62).

The next (and hopefully final) PR will be more focused and technical.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/pairgenomealign branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Also, the samplesheet_full.csv was updated to feature more T2T genomes.

Closes #62
Closes #56 for BED support.

SAM, BAM, and CRAM files are sorted for the convenience of the user,
who may for instance want to index them and use them as genome browser
tracks.

For the BAM and CRAM files, a sequence dictionary is computed before
export, to include all the sequences from the _target_ genome including
those that do not have an alignment to a given _query_, so that BAM or
CRAM files from multiple queries can be merged together without breaking
the sort order.

An assembly tag is also added, in case it is useful in the future.

@sateeshperi sateeshperi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@charles-plessy charles-plessy merged commit 58ae6a8 into dev May 25, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants