Added VDJ concatenation modules and added MuData object #435

saraterzo · 2025-02-27T14:25:28Z

Added VDJ concatenation module to concatenate "filtered_contig_annotation" files from scirpy package.
Added MuData module to create MuData objects to handle VDJ, and CITE-seq modalities. Specifically MuData object is built only for filtered count matrices (not raw) from GEX and CITE-seq modalities.

…on' into MuData_implementation

…i is used

fmalmeida · 2025-03-27T15:20:21Z

Hi @saraterzo ,
Thanks for working on this and opening the PR. Looks super neat.

One first thought I had is:

Maybe would make sense to already add these modules to nf-core instead of adding them locally?

With that we would gain that they would already be tested as standalone modules and would know they are working fine for including in the pipeline.

Or is there any specific reason for doing as a local module?

subworkflows/local/h5ad_conversion.nf

tests/main_pipeline_cellrangermulti.nf.test

nf-test.config

workflows/scrnaseq.nf

saraterzo · 2025-03-28T10:52:16Z

Hi @fmalmeida
Thanks for your review!
Regarding local or nf-core modules, I decided to develop these two modules as local ones because they are specific to this project, not easily reusable in other workflows and also include a custom Python script tailored to the project's needs.

grst · 2025-03-31T11:32:44Z

Hi @saraterzo,

thank you for working on this! I'll try to find time this week to give it a proper review.

grst

Sorry for the delay, finally had the time to look at it.

Found mostly minor things. Since the python parts of the pipeline are increasing, I am also adding ruff linter/formatter in #464. Once that's in, please also apply it to your Python scripts.

grst · 2025-04-24T06:19:03Z

modules/local/concatenate_vdj/resources/usr/bin/concatenate_vdj.py

+        for run, vdj in zip(input_run_id,vdj_files):
+            # Read folders with the filtered contigue annotation and store datasets in a dictionary
+            print("\n===== READING CONTIGUE ANNOTATION MATRIX =====")
+            print("\nProcessing filtered contigue table in folder ... ", end ='')


Suggested change

print("\nProcessing filtered contigue table in folder ... ", end ='')

print("\nProcessing filtered contig table in folder ... ", end ='')

grst · 2025-04-24T06:21:49Z

modules/local/concatenate_vdj/resources/usr/bin/concatenate_vdj.py

+        if len(adata_vdj_list) == 1:
+            adata_vdj_concatenated = adata_vdj_list[0]
+            print("Only one non-empty file found. Saving the file as is without concatenation.")
+        else:


is it necessary to special-case this? i.e. wouldn't ad.concat just work fine with a single file?

grst · 2025-04-24T06:24:20Z

modules/local/concatenate_vdj/resources/usr/bin/concatenate_vdj.py

+import anndata as ad                # store annotated matrix as anndata object
+
+
+warnings.filterwarnings("ignore")


It would be better to filter only specific (expected) warnings, e.g. by category or message.

grst · 2025-04-24T06:24:22Z

modules/local/convert_mudata/resources/usr/bin/convert_mudata.py

+from mudata import MuData
+
+
+warnings.filterwarnings("ignore")


It would be better to filter only specific (expected) warnings, e.g. by category or message.

grst · 2025-04-24T06:47:40Z

modules/local/convert_mudata/resources/usr/bin/convert_mudata.py

+            modalities["gex"] = adata[:, adata.var["feature_types"] == "Gene Expression"]
+        # Add 'pro' modality if defined
+        if adata[:, adata.var["feature_types"] == "Antibody Capture"].shape[1] > 0:
+            modalities["pro"] = adata[:, adata.var["feature_types"] == "Antibody Capture"]


Suggested change

modalities["pro"] = adata[:, adata.var["feature_types"] == "Antibody Capture"]

modalities["protein"] = adata[:, adata.var["feature_types"] == "Antibody Capture"]

Maybe more explicit?

grst · 2025-04-24T06:54:23Z

subworkflows/local/align_cellrangermulti.nf

+            def desired_files = outs.findAll { it.name == "filtered_contig_annotations.csv" }
+            if (desired_files.size() > 0) {
+                [ meta, desired_files ]
+            }


Suggested change

def desired_files = outs.findAll { it.name == "filtered_contig_annotations.csv" }

if (desired_files.size() > 0) {

[ meta, desired_files ]

}

def desired_files = outs.findAll { it.name == "filtered_contig_annotations.csv" }

if (desired_files.size() > 0) {

[ meta, desired_files ]

}

Couldn't you also use the parse_demultiplexed_output_channels function for this? Would your code still work for VDJ in combination with demultiplexing?

grst · 2025-04-24T06:54:53Z

subworkflows/local/align_cellrangermulti.nf

+        def meta = []
+        def files = []
+
+        list.collate(2).each { pair ->
+            meta << pair[0]
+            files << pair[1]
+        }
+        return [meta, files.flatten()]


please indent

grst · 2025-04-24T06:55:34Z

subworkflows/local/align_cellrangermulti.nf

+        }
+
+        ch_vdj_files_collect =  ch_vdj_files.collect()
+        ch_transformed_channel = ch_vdj_files_collect.map { list ->


Suggested change

ch_transformed_channel = ch_vdj_files_collect.map { list ->

ch_vdj = ch_vdj_files_collect.map { list ->

grst · 2025-04-24T06:59:12Z

tests/main_pipeline_cellrangermulti.nf.test

+                //{assert workflow.trace.tasks().size() == 59},

                // How many results were produced?
-                {assert path("${outputDir}/results_cellrangermulti").list().size() == 4},
-                {assert path("${outputDir}/results_cellrangermulti/cellrangermulti").list().size() == 5},
-                {assert path("${outputDir}/results_cellrangermulti/cellrangermulti/mtx_conversions").list().size() == 16},
-                {assert path("${outputDir}/results_cellrangermulti/cellrangermulti/count").list().size() == 4},
-                {assert path("${outputDir}/results_cellrangermulti/fastqc").list().size() == 48},
-                {assert path("${outputDir}/results_cellrangermulti/multiqc").list().size() == 3},
+                //{assert path("${outputDir}/results_cellrangermulti").list().size() == 6},
+                //{assert path("${outputDir}/results_cellrangermulti/cellrangermulti").list().size() == 5},
+                //{assert path("${outputDir}/results_cellrangermulti/cellrangermulti/mtx_conversions").list().size() == 16},
+                //{assert path("${outputDir}/results_cellrangermulti/cellrangermulti/count").list().size() == 4},
+                //{assert path("${outputDir}/results_cellrangermulti/fastqc").list().size() == 48},
+                //{assert path("${outputDir}/results_cellrangermulti/multiqc").list().size() == 3},


Let's not forget to re-enable the testcase before merging

grst · 2025-04-24T07:33:51Z

workflows/scrnaseq.nf

+            ch_vdj
+        )
+        ch_versions = ch_versions.mix(CONVERT_MUDATA.out.versions)
+    } else {'nothing to convert to MuData'}


Suggested change

} else {'nothing to convert to MuData'}

}

Added MuData object

5080be6

saraterzo marked this pull request as draft February 27, 2025 14:44

SaraTerzol and others added 5 commits February 28, 2025 10:25

Added config parameter to allow module structure

7852c46

Merge branch 'dev' into MuData_implementation

4741d10

Change script's name

9020e9f

Modify module to handle empty files

2a0331b

Merge remote-tracking branch 'refs/remotes/origin/MuData_implementati…

a19b49e

…on' into MuData_implementation

apeltzer added this to the 4.1.0 milestone Mar 10, 2025

SaraTerzol and others added 4 commits March 11, 2025 16:07

Modify the channel to handle the absence of the VDJ file

56cd8cc

Merge branch 'dev' into MuData_implementation

7e2e38a

Added version to modules

8e903af

Added version to modules

2ceb3e9

saraterzo marked this pull request as ready for review March 11, 2025 16:06

SaraTerzol and others added 13 commits March 11, 2025 17:28

Update output documentation

654e65a

Added MuData implementation description to output documentation

6a03158

Added option for handling VDJ missing

755a69d

Modify test to handle changes in output data

25519c4

Added the option to create a MuData object only when Cell Ranger Mult…

71695ac

…i is used

Change permission

096e080

Trailing whitespace

5c2ea4b

Merge branch 'dev' into MuData_implementation

f53418d

Merge branch 'dev' into MuData_implementation

2302b58

Adjust indentation

c783c0f

Adjust indentation

a303817

Remove test

3b7d9a8

Trailing whitespace

e311262

saraterzo mentioned this pull request Mar 27, 2025

MuData conversion #292

Open

fmalmeida reviewed Mar 27, 2025

View reviewed changes

subworkflows/local/h5ad_conversion.nf Outdated Show resolved Hide resolved

tests/main_pipeline_cellrangermulti.nf.test Outdated Show resolved Hide resolved

nf-test.config Outdated Show resolved Hide resolved

workflows/scrnaseq.nf Show resolved Hide resolved

SaraTerzol and others added 3 commits March 28, 2025 13:53

Changed output channel

c0ad49c

Merge branch 'dev' into MuData_implementation

1d773cc

Remove checks on number of results and tasks executed

3ccfae8

SaraTerzol added 3 commits March 31, 2025 14:26

Added check on results and task

f334e76

Added nf-test.config

5902d3d

Remove check on tasks and results

af4576b

grst requested changes Apr 24, 2025

View reviewed changes

Merge branch 'dev' into MuData_implementation

cb74297

grst mentioned this pull request May 9, 2025

Missing feature types when converting mtx matrix to h5ad #468

Open

	print("\nProcessing filtered contigue table in folder ... ", end ='')
	print("\nProcessing filtered contig table in folder ... ", end ='')

		import anndata as ad # store annotated matrix as anndata object


		warnings.filterwarnings("ignore")

	modalities["pro"] = adata[:, adata.var["feature_types"] == "Antibody Capture"]
	modalities["protein"] = adata[:, adata.var["feature_types"] == "Antibody Capture"]

	ch_transformed_channel = ch_vdj_files_collect.map { list ->
	ch_vdj = ch_vdj_files_collect.map { list ->

Added VDJ concatenation modules and added MuData object #435

Are you sure you want to change the base?

Added VDJ concatenation modules and added MuData object #435

Uh oh!

Conversation

saraterzo commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmalmeida commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

saraterzo commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grst commented Mar 31, 2025

Uh oh!

grst left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

saraterzo commented Feb 27, 2025 •

edited

Loading

fmalmeida commented Mar 27, 2025 •

edited

Loading

saraterzo commented Mar 28, 2025 •

edited

Loading