Skip to content

Option to extract columns from the TSV before compressing and indexing it#197

Open
muffato wants to merge 4 commits intomainfrom
tsv_cut_bgzip_tabix
Open

Option to extract columns from the TSV before compressing and indexing it#197
muffato wants to merge 4 commits intomainfrom
tsv_cut_bgzip_tabix

Conversation

@muffato
Copy link
Copy Markdown
Member

@muffato muffato commented Mar 4, 2026

A PR to address sanger-tol/sequencecomposition#47 (comment)

Jim made me realise that there's no need to have the patch in my pipeline since the module is from sanger-tol.

This adds optional parameters in BGZIPTABIX to cut certain columns and skip some header lines from the TSV, as well as updating the extension.

PR checklist

Closes #XXX

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions - See version_topics
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • If you've added or modified a sub-workflow, ensure all sub-analyses are emitted as individual channels. Outputs may also be collected together by analysis type or input sample, for instance to support downstream analysis or publishing.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

@muffato muffato self-assigned this Mar 4, 2026
Copilot AI review requested due to automatic review settings March 4, 2026 14:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds optional pre-processing to the BGZIPTABIX module so it can (optionally) extract specific columns and skip header lines from tab-delimited inputs (e.g. converting TSV → BED) before bgzipping and indexing, and updates downstream usage/tests accordingly.

Changes:

  • Extend BGZIPTABIX to accept column_numbers, header_lines, and extension inputs and use them to transform data before bgzip/tabix.
  • Update SOFT_MASKED_FASTA_REPEATS to pass the new (optional) parameter tuple into BGZIPTABIX.
  • Add a new module test covering TSV→BED extraction and update snapshots.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
subworkflows/sanger-tol/soft_masked_fasta_repeats/main.nf Updates BGZIPTABIX invocation to supply the new optional parameter tuple.
modules/sanger-tol/bgziptabix/main.nf Implements optional cut/header-skip behavior and output extension override before bgzip/tabix.
modules/sanger-tol/bgziptabix/meta.yml Documents the new optional inputs for column extraction, header skipping, and output extension.
modules/sanger-tol/bgziptabix/tests/main.nf.test Updates all tests for the new input signature and adds a TSV→BED test.
modules/sanger-tol/bgziptabix/tests/main.nf.test.snap Updates snapshots to reflect the new test and outputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@muffato muffato mentioned this pull request Mar 4, 2026
18 tasks
@muffato muffato force-pushed the tsv_cut_bgzip_tabix branch from 1d12dcf to d60fafc Compare March 4, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants