Skip to content

Commit aabc5a5

Browse files
committed
Merge remote-tracking branch 'origin/main' into add-create-cell-masks-component
# Conflicts: # CHANGELOG.md # src/authors/luke_zappia.yaml
2 parents 19b36ec + 67023c3 commit aabc5a5

112 files changed

Lines changed: 2851 additions & 229 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/integration-test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ jobs:
7373

7474
- uses: viash-io/viash-actions/setup@v6
7575

76-
- uses: nf-core/setup-nextflow@v2.1.4
76+
- uses: nf-core/setup-nextflow@v3.0.0
7777

7878
# use cache
7979
- name: Cache resources data

.github/workflows/release-build.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ jobs:
6262

6363
- uses: viash-io/viash-actions/setup@v6
6464

65-
- uses: nf-core/setup-nextflow@v2.1.4
65+
- uses: nf-core/setup-nextflow@v3.0.0
6666

6767
# use cache
6868
- name: Cache resources data

.github/workflows/viash-test.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ jobs:
2121
uses: actions/setup-python@v6
2222
- uses: r-lib/actions/setup-r@v2
2323
with:
24+
r-version: 4.5.3
2425
use-public-rspm: true
2526
- run: python -m pip install pre-commit
2627
shell: bash

CHANGELOG.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,28 @@
88

99
* `workflows/rna/rna_multisample`, `workflows/multiomics/process_batches`, `feature_annotation/highly_variable_features_scanpy`: add an option to exclude features before running highly variable gene calculation based on a user-defined list of feature names (PR #1121).
1010

11+
* `annotate/consensus_vote`: new component computing a (weighted) majority vote across cell type labels from multiple annotation methods (PR #1151).
12+
*
1113
* `filter/filter_with_quantile`: added a component to filter numerical .obs or .var columns based on quantile thresholds, with optional subsetting (PR #1146).
1214

1315
* `dimred/pca`: added possibility to do chunked processing using arguments `chunks` and `chunk_size`. Also added a `seed` argument in order to better control the variability between executions (PR #1157).
1416

17+
* `workflows/multiomics/process_singlesample`: New workflow for processing RNA, protein and GDO modalities of individual samples (PR #1147).
18+
19+
* `transform/clear_slots`: New component that can be used to remove all items from slots of a MuData object (PR #1171).
20+
21+
* `workflows/multiomics/process_singlesample`, `workflows/multiomics/process_samples`, `workflows/multiomics/process_batches`: add `--intersect_obs` option to remove observations that are not present in all processed modalities, so each modality shares the same set of cells (PR #1173, 1175).
22+
23+
* `labels_transfer/cellmapper`: New component that transfers labels from a reference to a query with a shared embedding using CellMapper (PR #1169, PR #1177)
24+
1525
* `filter/create_cell_masks`: added a component to create boolean cell masks from a set of user-provided filters (PR #1165).
1626

1727
## MAJOR CHANGES
1828

1929
* `qc/calculate_qc_metrics`: major improvements to memory consumption and runtimes (PR #1140).
2030

31+
* `annotate/popv`: bump version to 0.6.1 (PR #1167).
32+
2133
## MINOR CHANGES
2234

2335
* `dataflow/split_modalities`: improve memory consumption by only reading one modality at the same time (PR #1152).
@@ -28,10 +40,30 @@
2840

2941
* Bump viash to 0.9.7 (PR #1145)
3042

43+
* `annotate/celltypist` and `workflows/annotation/celltypist`: set `--input_layer` default to `log_normalized` and `--reference_var_input` default to `filter_with_hvg` to align with upstream workflow defaults (PR #1155).
44+
45+
* `annotate/singler`: set `--input_layer` default to `log_normalized` and `--reference_var_input` default to `filter_with_hvg` to align with upstream workflow defaults (PR #1155).
46+
47+
* `workflows/annotation/scanvi_scarches`: set `--input_obs_batch_label` and `--reference_obs_batch_label` defaults to `sample_id` and `--reference_var_hvg` default to `filter_with_hvg` to align with upstream workflow defaults (PR #1155).
48+
49+
* `cluster/leiden`: added `flavor`, `n_iterations` and `seed` arguments (PR #1132)
50+
51+
* `cluster/leiden`: avoid creating unnecessary copies of the output data (PR #1132).
52+
53+
* `workflows/multiomics/process_samples`: refactored to use a shared `process_singlesample_base` subworkflow, which is also used by the new `process_singlesample` workflow to avoid code duplication (PR #1147).
54+
55+
* Bump anndata to `0.12.11` (PR #1174).
56+
57+
* Add missing `example` fields to several component and workflow configurations (PR #1067).
58+
59+
* Testing: bump `viashpy` to 0.10.0 (PR #1178).
60+
3161
## BUG FIXES
3262

3363
* `dataflow/split_h5mu`: pin scipy version to 1.16.3 to avoid regression that corrupts large sparse matrix indexing (PR #1153).
3464

65+
* `convert/from_h5ad_h5mu`: store and reset var index names to avoid issues with a change in mudata (PR #1184).
66+
3567
# openpipelines 4.0.4
3668

3769
## BUG FIXES

resources_test_scripts/annotation_test_data.sh

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,10 @@ disease = np.random.choice(["healthy", "diseased"], size=n_cells, p=[0.5, 0.5])
6565
sub_ref_adata_final.obs["treatment"] = treatment
6666
sub_ref_adata_final.obs["disease"] = disease
6767
68+
# Strip raw slot - not needed for annotation and causes compatibility issues between AnnData/MuData versions
69+
sub_ref_adata_final = sub_ref_adata_final.copy()
70+
sub_ref_adata_final.raw = None
71+
6872
# Write out data
6973
sub_ref_adata_final.write("${OUT}/TS_Blood_filtered.h5ad", compression='gzip')
7074
HEREDOC

src/annotate/celltypist/config.vsh.yaml

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ argument_groups:
2626
required: false
2727
- name: "--input_layer"
2828
type: string
29+
default: log_normalized
2930
description: The layer in the input data containing counts that are lognormalized to 10000, .X is not to be used.
3031
- name: "--input_var_gene_names"
3132
type: string
@@ -62,9 +63,10 @@ argument_groups:
6263
default: "cell_ontology_class"
6364
- name: "--reference_var_input"
6465
type: string
66+
default: "filter_with_hvg"
6567
required: false
6668
description: |
67-
.var column containing highly variable genes. By default, do not subset genes.
69+
.var column containing highly variable genes. If not provided, genes will not be subset.
6870
- name: "--reference_var_gene_names"
6971
type: string
7072
required: false
@@ -147,13 +149,14 @@ engines:
147149
- type: docker
148150
image: nvcr.io/nvidia/pytorch:25.11-py3
149151
setup:
150-
- type: python
151-
__merge__: [ /src/base/requirements/scanpy.yaml, .]
152152
- type: python
153153
packages:
154154
- celltypist==1.7.1
155155
- type: python
156156
__merge__: [ /src/base/requirements/anndata_mudata.yaml, .]
157+
test_setup:
158+
- type: python
159+
__merge__: [ /src/base/requirements/scanpy.yaml, .]
157160
__merge__: [ /src/base/requirements/python_test_setup.yaml, .]
158161
runners:
159162
- type: executable

src/annotate/celltypist/test.py

Lines changed: 34 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -21,34 +21,47 @@
2121
model_file = (
2222
f"{meta['resources_dir']}/annotation_test_data/celltypist_model_Immune_All_Low.pkl"
2323
)
24-
celltypist_input_file = (
25-
f"{meta['resources_dir']}/annotation_test_data/demo_2000_cells.h5mu"
26-
)
27-
# input_file = f"{meta['resources_dir']}/pbmc_1k_protein_v3/pbmc_1k_protein_v3_mms.h5mu"
24+
input_file_1 = f"{meta['resources_dir']}/pbmc_1k_protein_v3/pbmc_1k_protein_v3_mms.h5mu"
25+
input_file_2 = f"{meta['resources_dir']}/annotation_test_data/demo_2000_cells.h5mu"
26+
reference_file = f"{meta['resources_dir']}/annotation_test_data/TS_Blood_filtered.h5mu"
2827

2928

3029
def log_normalize(adata):
31-
sc.pp.normalize_total(adata, target_sum=1e4)
32-
sc.pp.log1p(adata)
30+
adata_norm = sc.pp.normalize_total(adata, target_sum=1e4, copy=True)
31+
adata_lognorm = sc.pp.log1p(adata_norm, copy=True)
32+
adata.layers["log_normalized"] = adata_lognorm.X
33+
return adata
34+
35+
36+
def calculate_hvg(adata, n_top_genes=1000):
37+
adata_hvg = sc.pp.highly_variable_genes(adata, n_top_genes=n_top_genes, copy=True)
38+
adata.var["filter_with_hvg"] = adata_hvg.var["highly_variable"]
3339
return adata
3440

3541

3642
@pytest.fixture
3743
def reference_mdata():
38-
mdata = mu.read_h5mu(
39-
f"{meta['resources_dir']}/annotation_test_data/TS_Blood_filtered.h5mu"
40-
)
44+
mdata = mu.read_h5mu(reference_file)
45+
adata = mdata.mod["rna"] # already has layer "log_normalized" with 10k target sum
46+
adata.var["filter_with_hvg"] = adata.var[
47+
"highly_variable"
48+
] # already has highly variable genes calculated
49+
return mdata
50+
51+
52+
@pytest.fixture
53+
def input_mdata():
54+
mdata = mu.read_h5mu(input_file_1)
4155
adata = mdata.mod["rna"].copy()
56+
adata.layers["counts"] = adata.X.copy() # store raw counts in a layer
4257
adata_lognorm = log_normalize(adata)
4358
mdata.mod["rna"] = adata_lognorm
4459
return mdata
4560

4661

4762
@pytest.fixture
48-
def input_mdata():
49-
mdata = mu.read_h5mu(
50-
f"{meta['resources_dir']}/pbmc_1k_protein_v3/pbmc_1k_protein_v3_mms.h5mu"
51-
)
63+
def model_input_mdata():
64+
mdata = mu.read_h5mu(input_file_2)
5265
adata = mdata.mod["rna"].copy()
5366
adata_lognorm = log_normalize(adata)
5467
mdata.mod["rna"] = adata_lognorm
@@ -155,15 +168,20 @@ def test_set_params(
155168
)
156169

157170

158-
def test_with_model(run_component, random_h5mu_path):
171+
def test_with_model(
172+
run_component, random_h5mu_path, write_mudata_to_file, model_input_mdata
173+
):
159174
output_file = random_h5mu_path()
175+
input_file = write_mudata_to_file(model_input_mdata)
160176

161177
run_component(
162178
[
163179
"--input",
164-
celltypist_input_file,
180+
input_file,
165181
"--model",
166182
model_file,
183+
"--reference_layer",
184+
"",
167185
"--reference_obs_targets",
168186
"cell_type",
169187
"--output",
@@ -208,7 +226,7 @@ def test_fail_invalid_input_expression(
208226
"--input",
209227
input_file,
210228
"--input_layer",
211-
"log_normalized",
229+
"counts",
212230
"--reference",
213231
reference_file,
214232
"--reference_layer",
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
name: consensus_vote
2+
namespace: annotate
3+
scope: "public"
4+
description: |
5+
Combines cell type predictions from multiple annotation methods into a single consensus prediction using a weighted majority vote.
6+
For each cell, each method votes for its predicted cell type, optionally weighted by the probability score and/or a per-method weight.
7+
The consensus prediction is the cell type with the highest total weighted vote.
8+
Note that this method does not leverage pre-existing ontology or perform any reconciliation of cell type labels across methods, so the same cell type may be represented by different labels in different methods and will be treated as distinct cell types in the vote.
9+
authors:
10+
- __merge__: /src/authors/dorien_roosen.yaml
11+
roles: [ author ]
12+
13+
argument_groups:
14+
- name: Inputs
15+
description: Input dataset arguments.
16+
arguments:
17+
- name: "--input"
18+
type: file
19+
description: Input h5mu file containing cell type predictions in .obs.
20+
direction: input
21+
required: true
22+
example: input.h5mu
23+
- name: "--modality"
24+
description: Which modality to process.
25+
type: string
26+
default: "rna"
27+
required: false
28+
- name: "--input_obs_predictions"
29+
type: string
30+
description: |
31+
One or more .obs column names containing cell type predictions (labels) from
32+
different annotation methods.
33+
required: true
34+
multiple: true
35+
example: ["scanvi_pred", "celltypist_pred"]
36+
- name: "--input_obs_probabilities"
37+
type: string
38+
description: |
39+
One or more .obs column names containing prediction probability scores,
40+
one per method in --input_obs_predictions. When provided, each method's
41+
vote is scaled by the probability score for that cell (in addition to
42+
any per-method --weights). Must be the same length as --input_obs_predictions.
43+
required: false
44+
multiple: true
45+
example: ["scanvi_prob", "celltypist_prob", "singler_prob"]
46+
- name: "--tie_label"
47+
type: string
48+
description: |
49+
Label to assign when two or more cell types receive equal votes.
50+
If not provided, tied cells are assigned None (missing value).
51+
required: false
52+
example: "Unknown"
53+
- name: "--weights"
54+
type: double
55+
description: |
56+
Per-method weights for the consensus vote. Must be the same length as
57+
--input_obs_predictions when provided. Weights are normalized to sum to 1
58+
before use. If not provided, all methods are weighted equally.
59+
required: false
60+
multiple: true
61+
example: [1.0, 2.0]
62+
63+
- name: Outputs
64+
description: Output arguments.
65+
arguments:
66+
- name: "--output"
67+
alternatives: [-o]
68+
type: file
69+
description: Output h5mu file.
70+
direction: output
71+
example: output.h5mu
72+
- name: "--output_obs_predictions"
73+
type: string
74+
default: consensus_pred
75+
required: false
76+
description: |
77+
In which `.obs` slot to store the consensus predicted cell type.
78+
- name: "--output_obs_score"
79+
type: string
80+
default: consensus_score
81+
required: false
82+
description: |
83+
In which `.obs` slot to store the consensus score, defined as the fraction
84+
of total weight assigned to the winning cell type.
85+
__merge__: [., /src/base/h5_compression_argument.yaml]
86+
87+
resources:
88+
- type: python_script
89+
path: script.py
90+
- path: /src/utils/setup_logger.py
91+
- path: /src/utils/compress_h5mu.py
92+
93+
test_resources:
94+
- type: python_script
95+
path: test.py
96+
97+
engines:
98+
- type: docker
99+
image: python:3.13-slim
100+
setup:
101+
- type: apt
102+
packages:
103+
- procps
104+
- type: python
105+
__merge__: [ /src/base/requirements/anndata_mudata.yaml, .]
106+
__merge__: [ /src/base/requirements/python_test_setup.yaml, .]
107+
runners:
108+
- type: executable
109+
- type: nextflow
110+
directives:
111+
label: [lowcpu, lowmem, lowdisk]

0 commit comments

Comments
 (0)