Skip to content

Added VDJ concatenation modules and added MuData object #435

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 30 commits into
base: dev
Choose a base branch
from

Conversation

saraterzo
Copy link

@saraterzo saraterzo commented Feb 27, 2025

Added VDJ concatenation module to concatenate "filtered_contig_annotation" files from scirpy package.
Added MuData module to create MuData objects to handle VDJ, and CITE-seq modalities. Specifically MuData object is built only for filtered count matrices (not raw) from GEX and CITE-seq modalities.

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/scrnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@saraterzo saraterzo marked this pull request as draft February 27, 2025 14:44
@apeltzer apeltzer added this to the 4.1.0 milestone Mar 10, 2025
@saraterzo saraterzo marked this pull request as ready for review March 11, 2025 16:06
@saraterzo saraterzo mentioned this pull request Mar 27, 2025
@fmalmeida
Copy link
Contributor

fmalmeida commented Mar 27, 2025

Hi @saraterzo ,
Thanks for working on this and opening the PR. Looks super neat.

One first thought I had is:

Maybe would make sense to already add these modules to nf-core instead of adding them locally?

With that we would gain that they would already be tested as standalone modules and would know they are working fine for including in the pipeline.

Or is there any specific reason for doing as a local module?

@saraterzo
Copy link
Author

saraterzo commented Mar 28, 2025

Hi @fmalmeida
Thanks for your review!
Regarding local or nf-core modules, I decided to develop these two modules as local ones because they are specific to this project, not easily reusable in other workflows and also include a custom Python script tailored to the project's needs.

@grst
Copy link
Member

grst commented Mar 31, 2025

Hi @saraterzo,

thank you for working on this! I'll try to find time this week to give it a proper review.

Copy link
Member

@grst grst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, finally had the time to look at it.

Found mostly minor things. Since the python parts of the pipeline are increasing, I am also adding ruff linter/formatter in #464. Once that's in, please also apply it to your Python scripts.

for run, vdj in zip(input_run_id,vdj_files):
# Read folders with the filtered contigue annotation and store datasets in a dictionary
print("\n===== READING CONTIGUE ANNOTATION MATRIX =====")
print("\nProcessing filtered contigue table in folder ... ", end ='')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("\nProcessing filtered contigue table in folder ... ", end ='')
print("\nProcessing filtered contig table in folder ... ", end ='')

Comment on lines +103 to +106
if len(adata_vdj_list) == 1:
adata_vdj_concatenated = adata_vdj_list[0]
print("Only one non-empty file found. Saving the file as is without concatenation.")
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary to special-case this? i.e. wouldn't ad.concat just work fine with a single file?

import anndata as ad # store annotated matrix as anndata object


warnings.filterwarnings("ignore")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to filter only specific (expected) warnings, e.g. by category or message.

from mudata import MuData


warnings.filterwarnings("ignore")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to filter only specific (expected) warnings, e.g. by category or message.

modalities["gex"] = adata[:, adata.var["feature_types"] == "Gene Expression"]
# Add 'pro' modality if defined
if adata[:, adata.var["feature_types"] == "Antibody Capture"].shape[1] > 0:
modalities["pro"] = adata[:, adata.var["feature_types"] == "Antibody Capture"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
modalities["pro"] = adata[:, adata.var["feature_types"] == "Antibody Capture"]
modalities["protein"] = adata[:, adata.var["feature_types"] == "Antibody Capture"]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more explicit?

Comment on lines +208 to +211
def desired_files = outs.findAll { it.name == "filtered_contig_annotations.csv" }
if (desired_files.size() > 0) {
[ meta, desired_files ]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def desired_files = outs.findAll { it.name == "filtered_contig_annotations.csv" }
if (desired_files.size() > 0) {
[ meta, desired_files ]
}
def desired_files = outs.findAll { it.name == "filtered_contig_annotations.csv" }
if (desired_files.size() > 0) {
[ meta, desired_files ]
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you also use the parse_demultiplexed_output_channels function for this? Would your code still work for VDJ in combination with demultiplexing?

Comment on lines +218 to +225
def meta = []
def files = []

list.collate(2).each { pair ->
meta << pair[0]
files << pair[1]
}
return [meta, files.flatten()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please indent

}

ch_vdj_files_collect = ch_vdj_files.collect()
ch_transformed_channel = ch_vdj_files_collect.map { list ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ch_transformed_channel = ch_vdj_files_collect.map { list ->
ch_vdj = ch_vdj_files_collect.map { list ->

Comment on lines +37 to +45
//{assert workflow.trace.tasks().size() == 59},

// How many results were produced?
{assert path("${outputDir}/results_cellrangermulti").list().size() == 4},
{assert path("${outputDir}/results_cellrangermulti/cellrangermulti").list().size() == 5},
{assert path("${outputDir}/results_cellrangermulti/cellrangermulti/mtx_conversions").list().size() == 16},
{assert path("${outputDir}/results_cellrangermulti/cellrangermulti/count").list().size() == 4},
{assert path("${outputDir}/results_cellrangermulti/fastqc").list().size() == 48},
{assert path("${outputDir}/results_cellrangermulti/multiqc").list().size() == 3},
//{assert path("${outputDir}/results_cellrangermulti").list().size() == 6},
//{assert path("${outputDir}/results_cellrangermulti/cellrangermulti").list().size() == 5},
//{assert path("${outputDir}/results_cellrangermulti/cellrangermulti/mtx_conversions").list().size() == 16},
//{assert path("${outputDir}/results_cellrangermulti/cellrangermulti/count").list().size() == 4},
//{assert path("${outputDir}/results_cellrangermulti/fastqc").list().size() == 48},
//{assert path("${outputDir}/results_cellrangermulti/multiqc").list().size() == 3},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not forget to re-enable the testcase before merging

ch_vdj
)
ch_versions = ch_versions.mix(CONVERT_MUDATA.out.versions)
} else {'nothing to convert to MuData'}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else {'nothing to convert to MuData'}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants