-
Notifications
You must be signed in to change notification settings - Fork 188
Fix: Support for --aligner cellrangerarc
#441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @matbonfanti, thanks for working on this! |
Hi @grst, that was indeed the plan! I have seen that on the test-dataset repo there is already a dataset for atac-seq (https://github.com/nf-core/test-datasets/tree/modules/data/genomics/homo_sapiens/10xgenomics/cellranger-atac) which I think would be a good test, for starters. In the long term I could make a new test using multiome data (atac+gex) that would be probably more appropriate, but it will definitely take much more time to implement. If you agree, I will start including the atac-only test in this PR, so that atac alignment will be fixed soon in the dev branch. Then maybe I can make a new PR for the other test dataset. |
…crnaseq into fix_cellrangerarc_input_ch
Hi @matbonfanti, is this ready, i.e. did you add the ATAC testcase? |
hi, I was planning to use a ATAC dataset as test case, but unfortunately It turned out that Cellranger-arc needs boh modalities, ATAC and gex... I need to create a new test dataset subsampling a 10x multiome dataset, I was planning to start today at the hackathon. |
Bottom line: no, It Is not ready, the test Is still missing |
ok, no worries |
hi, I have created the dataset for cellranger-arc, I have run the pipeline with it and it is ready to be added to the dataset test repository (nf-core/test-datasets#1562). |
@grst I have added the test file and added the test to the CI, as you can see it worked. I think the code is ready for review. I am still missing the changelog update. I will do It ASAP. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
def cellrangerarcStructure(input) { | ||
def (metas, fastqs) = input[1..2] | ||
|
||
// Check that multiple runs of the same sample are of the same datatype i.e. single-end / paired-end | ||
def endedness_ok = metas.collect{ meta -> meta.single_end }.unique().size == 1 | ||
if (!endedness_ok) { | ||
error("Please check input samplesheet -> Multiple runs of a sample must be of the same datatype i.e. single-end or paired-end: ${metas[0].id}") | ||
} | ||
|
||
// Validate that the property "sample_type" is present and has valid values | ||
def valid_sample_types = ["gex", "atac"] | ||
def sample_type_ok = metas.collect { meta -> meta.sample_type }.unique().every { it in valid_sample_types } | ||
if (!sample_type_ok) { | ||
error("Please check input samplesheet -> The property 'sample_type' is required and can only be 'gex' or 'atac'.") | ||
} | ||
|
||
// Define a new common meta for all the fastqs in this channel instance | ||
def sampleMeta = metas[0].clone() | ||
sampleMeta.remove("sample_type") | ||
sampleMeta.remove("feature_type") | ||
|
||
// Create a list with all the entries of meta.sample_type | ||
def sampletypes = metas.collect { meta -> meta.sample_type } | ||
|
||
// Create a list with all the base name of the fastq files | ||
def subsamples = fastqs.collect { fastq -> | ||
def match = (fastq[0].baseName =~ /^(.*?)_S\d+_L\d+_R\d+_\d+\.fastq(\.gz)?$/) | ||
if (!match) { | ||
error("Filename does not follow the expected FASTQ filename convention (SampleName_S1_L001_R1_001.fastq.gz): ${fastq[0]}") | ||
} | ||
return match[0][1] | ||
} | ||
|
||
return [ sampleMeta, sampletypes, subsamples, fastqs.flatten() ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, this could go to the json-schema, but it currently can't because we only have one aligner-agnostic schema.
Created #461 to follow up as this is beyond the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, that would be much cleaner... If you need help writing and testing the schema for cellranger-arc, I would be happy to contribute!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to give #461 a shot, that would be fantastic. I don't think I'd have time soonish.
This PR introduces two changes to ensure the pipeline functions correctly when using
--aligner cellrangerarc
:cellrangerarc
module.cellrangerarc
module output for further processing in the pipeline.See issues #389 and #374
Testing
I have tested these changes locally subsampling FASTQ files from 10x Genomics, and the pipeline runs successfully. For reference, here is the samplesheet used in testing:
PR checklist
nf-core pipelines lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).