Skip to content

Fix: Support for --aligner cellrangerarc #441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Mar 31, 2025

Conversation

matbonfanti
Copy link

@matbonfanti matbonfanti commented Mar 5, 2025

This PR introduces two changes to ensure the pipeline functions correctly when using --aligner cellrangerarc:

  • A dedicated parsing section structures the samplesheet channel to be compatible with the cellrangerarc module.
  • The raw and filtered gene expression matrices are now extracted from the cellrangerarc module output for further processing in the pipeline.

See issues #389 and #374

Testing

I have tested these changes locally subsampling FASTQ files from 10x Genomics, and the pipeline runs successfully. For reference, here is the samplesheet used in testing:

sample,fastq_1,fastq_2,fastq_barcode,sample_type
10k_PBMC,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_atac_S2_L001_R1_001.fastq.gz,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_atac_S2_L001_R2_001.fastq.gz,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_atac_S2_L001_R3_001.fastq.gz,atac
10k_PBMC,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_atac_S2_L002_R1_001.fastq.gz,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_atac_S2_L002_R2_001.fastq.gz,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_atac_S2_L002_R3_001.fastq.gz,atac
10k_PBMC,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_gex_S2_L001_R1_001.fastq.gz,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_gex_S2_L001_R2_001.fastq.gz,,gex
10k_PBMC,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_gex_S2_L002_R1_001.fastq.gz,/scratch/matteo.bonfanti/test_scrnaseq/fastqs/10k_PBMC_Multiome_nextgem_Chromium_X_gex_S2_L002_R2_001.fastq.gz,,gex

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/scrnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@matbonfanti matbonfanti self-assigned this Mar 5, 2025
@grst
Copy link
Member

grst commented Mar 6, 2025

Hi @matbonfanti,

thanks for working on this!
Would you by any chance also have time to implement a test-case for cellrangerarc? It is currently not covered by CI at all which is one of the reasons the bug you are fixing exists in the first case. See also #290.

@matbonfanti
Copy link
Author

Hi @grst, that was indeed the plan!

I have seen that on the test-dataset repo there is already a dataset for atac-seq (https://github.com/nf-core/test-datasets/tree/modules/data/genomics/homo_sapiens/10xgenomics/cellranger-atac) which I think would be a good test, for starters. In the long term I could make a new test using multiome data (atac+gex) that would be probably more appropriate, but it will definitely take much more time to implement.

If you agree, I will start including the atac-only test in this PR, so that atac alignment will be fixed soon in the dev branch. Then maybe I can make a new PR for the other test dataset.

@apeltzer apeltzer added this to the 4.1.0 milestone Mar 10, 2025
@grst
Copy link
Member

grst commented Mar 25, 2025

Hi @matbonfanti, is this ready, i.e. did you add the ATAC testcase?

@matbonfanti
Copy link
Author

hi, I was planning to use a ATAC dataset as test case, but unfortunately It turned out that Cellranger-arc needs boh modalities, ATAC and gex... I need to create a new test dataset subsampling a 10x multiome dataset, I was planning to start today at the hackathon.

@matbonfanti
Copy link
Author

Bottom line: no, It Is not ready, the test Is still missing

@grst
Copy link
Member

grst commented Mar 25, 2025

ok, no worries

@matbonfanti
Copy link
Author

hi,

I have created the dataset for cellranger-arc, I have run the pipeline with it and it is ready to be added to the dataset test repository (nf-core/test-datasets#1562).
Now waiting for the review :-)

@matbonfanti
Copy link
Author

@grst I have added the test file and added the test to the CI, as you can see it worked.

I think the code is ready for review. I am still missing the changelog update. I will do It ASAP.

@matbonfanti matbonfanti requested a review from grst March 26, 2025 16:29
@matbonfanti matbonfanti moved this from To do to Ready for review in Hackathon March 2025 Mar 26, 2025
Copy link
Member

@grst grst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Comment on lines +217 to +250
def cellrangerarcStructure(input) {
def (metas, fastqs) = input[1..2]

// Check that multiple runs of the same sample are of the same datatype i.e. single-end / paired-end
def endedness_ok = metas.collect{ meta -> meta.single_end }.unique().size == 1
if (!endedness_ok) {
error("Please check input samplesheet -> Multiple runs of a sample must be of the same datatype i.e. single-end or paired-end: ${metas[0].id}")
}

// Validate that the property "sample_type" is present and has valid values
def valid_sample_types = ["gex", "atac"]
def sample_type_ok = metas.collect { meta -> meta.sample_type }.unique().every { it in valid_sample_types }
if (!sample_type_ok) {
error("Please check input samplesheet -> The property 'sample_type' is required and can only be 'gex' or 'atac'.")
}

// Define a new common meta for all the fastqs in this channel instance
def sampleMeta = metas[0].clone()
sampleMeta.remove("sample_type")
sampleMeta.remove("feature_type")

// Create a list with all the entries of meta.sample_type
def sampletypes = metas.collect { meta -> meta.sample_type }

// Create a list with all the base name of the fastq files
def subsamples = fastqs.collect { fastq ->
def match = (fastq[0].baseName =~ /^(.*?)_S\d+_L\d+_R\d+_\d+\.fastq(\.gz)?$/)
if (!match) {
error("Filename does not follow the expected FASTQ filename convention (SampleName_S1_L001_R1_001.fastq.gz): ${fastq[0]}")
}
return match[0][1]
}

return [ sampleMeta, sampletypes, subsamples, fastqs.flatten() ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this could go to the json-schema, but it currently can't because we only have one aligner-agnostic schema.
Created #461 to follow up as this is beyond the scope of this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, that would be much cleaner... If you need help writing and testing the schema for cellranger-arc, I would be happy to contribute!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to give #461 a shot, that would be fantastic. I don't think I'd have time soonish.

@github-project-automation github-project-automation bot moved this from Ready for review to In progress in Hackathon March 2025 Mar 31, 2025
@grst grst merged commit 43adb18 into nf-core:dev Mar 31, 2025
16 checks passed
@github-project-automation github-project-automation bot moved this from In progress to Done in Hackathon March 2025 Mar 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants