-
Notifications
You must be signed in to change notification settings - Fork 188
Added VDJ concatenation modules and added MuData object #435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
Hi @saraterzo , One first thought I had is: Maybe would make sense to already add these modules to nf-core instead of adding them locally? With that we would gain that they would already be tested as standalone modules and would know they are working fine for including in the pipeline. Or is there any specific reason for doing as a local module? |
Hi @fmalmeida |
Hi @saraterzo, thank you for working on this! I'll try to find time this week to give it a proper review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, finally had the time to look at it.
Found mostly minor things. Since the python parts of the pipeline are increasing, I am also adding ruff linter/formatter in #464. Once that's in, please also apply it to your Python scripts.
for run, vdj in zip(input_run_id,vdj_files): | ||
# Read folders with the filtered contigue annotation and store datasets in a dictionary | ||
print("\n===== READING CONTIGUE ANNOTATION MATRIX =====") | ||
print("\nProcessing filtered contigue table in folder ... ", end ='') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print("\nProcessing filtered contigue table in folder ... ", end ='') | |
print("\nProcessing filtered contig table in folder ... ", end ='') |
if len(adata_vdj_list) == 1: | ||
adata_vdj_concatenated = adata_vdj_list[0] | ||
print("Only one non-empty file found. Saving the file as is without concatenation.") | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary to special-case this? i.e. wouldn't ad.concat just work fine with a single file?
import anndata as ad # store annotated matrix as anndata object | ||
|
||
|
||
warnings.filterwarnings("ignore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to filter only specific (expected) warnings, e.g. by category or message.
from mudata import MuData | ||
|
||
|
||
warnings.filterwarnings("ignore") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to filter only specific (expected) warnings, e.g. by category or message.
modalities["gex"] = adata[:, adata.var["feature_types"] == "Gene Expression"] | ||
# Add 'pro' modality if defined | ||
if adata[:, adata.var["feature_types"] == "Antibody Capture"].shape[1] > 0: | ||
modalities["pro"] = adata[:, adata.var["feature_types"] == "Antibody Capture"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modalities["pro"] = adata[:, adata.var["feature_types"] == "Antibody Capture"] | |
modalities["protein"] = adata[:, adata.var["feature_types"] == "Antibody Capture"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe more explicit?
def desired_files = outs.findAll { it.name == "filtered_contig_annotations.csv" } | ||
if (desired_files.size() > 0) { | ||
[ meta, desired_files ] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def desired_files = outs.findAll { it.name == "filtered_contig_annotations.csv" } | |
if (desired_files.size() > 0) { | |
[ meta, desired_files ] | |
} | |
def desired_files = outs.findAll { it.name == "filtered_contig_annotations.csv" } | |
if (desired_files.size() > 0) { | |
[ meta, desired_files ] | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you also use the parse_demultiplexed_output_channels
function for this? Would your code still work for VDJ in combination with demultiplexing?
def meta = [] | ||
def files = [] | ||
|
||
list.collate(2).each { pair -> | ||
meta << pair[0] | ||
files << pair[1] | ||
} | ||
return [meta, files.flatten()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please indent
} | ||
|
||
ch_vdj_files_collect = ch_vdj_files.collect() | ||
ch_transformed_channel = ch_vdj_files_collect.map { list -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ch_transformed_channel = ch_vdj_files_collect.map { list -> | |
ch_vdj = ch_vdj_files_collect.map { list -> |
//{assert workflow.trace.tasks().size() == 59}, | ||
|
||
// How many results were produced? | ||
{assert path("${outputDir}/results_cellrangermulti").list().size() == 4}, | ||
{assert path("${outputDir}/results_cellrangermulti/cellrangermulti").list().size() == 5}, | ||
{assert path("${outputDir}/results_cellrangermulti/cellrangermulti/mtx_conversions").list().size() == 16}, | ||
{assert path("${outputDir}/results_cellrangermulti/cellrangermulti/count").list().size() == 4}, | ||
{assert path("${outputDir}/results_cellrangermulti/fastqc").list().size() == 48}, | ||
{assert path("${outputDir}/results_cellrangermulti/multiqc").list().size() == 3}, | ||
//{assert path("${outputDir}/results_cellrangermulti").list().size() == 6}, | ||
//{assert path("${outputDir}/results_cellrangermulti/cellrangermulti").list().size() == 5}, | ||
//{assert path("${outputDir}/results_cellrangermulti/cellrangermulti/mtx_conversions").list().size() == 16}, | ||
//{assert path("${outputDir}/results_cellrangermulti/cellrangermulti/count").list().size() == 4}, | ||
//{assert path("${outputDir}/results_cellrangermulti/fastqc").list().size() == 48}, | ||
//{assert path("${outputDir}/results_cellrangermulti/multiqc").list().size() == 3}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not forget to re-enable the testcase before merging
ch_vdj | ||
) | ||
ch_versions = ch_versions.mix(CONVERT_MUDATA.out.versions) | ||
} else {'nothing to convert to MuData'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else {'nothing to convert to MuData'} | |
} |
Added VDJ concatenation module to concatenate "filtered_contig_annotation" files from scirpy package.
Added MuData module to create MuData objects to handle VDJ, and CITE-seq modalities. Specifically MuData object is built only for filtered count matrices (not raw) from GEX and CITE-seq modalities.
nf-core pipelines lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).