-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathonclass.qmd
More file actions
164 lines (114 loc) · 7.95 KB
/
Copy pathonclass.qmd
File metadata and controls
164 lines (114 loc) · 7.95 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
---
title: "Onclass"
namespace: "Annotate"
description: "OnClass is a python package for single-cell cell type annotation."
type: "module"
toc: false
---
::: {.column-margin}
### Info
ID: `onclass`
Namespace: `annotate`
### Links
[Source](https://github.com/openpipelines-bio/openpipeline/blob/2.1.1/src/annotate/onclass/config.vsh.yaml){.btn-action .btn-sm .btn-info .btn role="button"}
:::
It uses the Cell Ontology to capture the cell type similarity.
These similarities enable OnClass to annotate cell types that are never seen in the training data
## Example commands
You can run the pipeline using `nextflow run`.
### View help
You can use `--help` as a parameter to get an overview of the possible parameters.
```bash
nextflow run openpipelines-bio/openpipeline \
-r 2.1.1 -latest \
-main-script target/nextflow/annotate/onclass/main.nf \
--help
```
### Run command
<details>
<summary>Example of `params.yaml`</summary>
```yaml
# Inputs
input: # please fill in - example: "input.h5mu"
modality: "rna"
# input_layer: "foo"
# input_var_gene_names: "foo"
input_reference_gene_overlap: 100
# Ongoloty
cl_nlp_emb_file: # please fill in - example: "path/to/file"
cl_ontology_file: # please fill in - example: "path/to/file"
cl_obo_file: # please fill in - example: "path/to/file"
# Reference
# reference: "reference.h5mu"
# reference_layer: "foo"
reference_obs_target: # please fill in - example: "cell_ontology_class"
# reference_var_gene_names: "foo"
# reference_var_input: "foo"
unknown_celltype: "Unknown"
# Outputs
# output: "$id.$key.output.h5mu"
# output_compression: "gzip"
output_obs_predictions: "onclass_pred"
output_obs_probability: "onclass_prob"
# Model arguments
# model: "foo"
max_iter: 30
# Nextflow input-output arguments
publish_dir: # please fill in - example: "output/"
# param_list: "my_params.yaml"
# Arguments
```
</details>
```bash
nextflow run openpipelines-bio/openpipeline \
-r 2.1.1 -latest \
-profile docker \
-main-script target/nextflow/annotate/onclass/main.nf \
-params-file params.yaml
```
:::{.callout-note}
Replace `-profile docker` with `-profile podman` or `-profile singularity` depending on the desired backend.
:::
## Argument groups
### Inputs
Input dataset (query) arguments
|Name |Description |Attributes |
|:----------|:--------------------------------------------------|:--------------------|
|`--input` |The input (query) data to be labeled. Should be a .h5mu file. |`file`, required, example: `"input.h5mu"` |
|`--modality` |Which modality to process. |`string`, default: `"rna"` |
|`--input_layer` |The layer in the input data to be used for cell type annotation if .X is not to be used. |`string` |
|`--input_var_gene_names` |The name of the adata var column in the input data containing gene names; when no gene_name_layer is provided, the var index will be used. |`string` |
|`--input_reference_gene_overlap` |The minimum number of genes present in both the reference and query datasets. |`integer`, default: `100` |
### Ongoloty
Ontology input files
|Name |Description |Attributes |
|:----------|:--------------------------------------------------|:--------------------|
|`--cl_nlp_emb_file` |The .nlp.emb file with the cell type embeddings. |`file`, required |
|`--cl_ontology_file` |The .ontology file with the cell type ontology. |`file`, required |
|`--cl_obo_file` |The .obo file with the cell type ontology. |`file`, required |
### Reference
Arguments related to the reference dataset.
|Name |Description |Attributes |
|:----------|:--------------------------------------------------|:--------------------|
|`--reference` |The reference data to train the CellTypist classifiers on. Only required if a pre-trained --model is not provided. |`file`, example: `"reference.h5mu"` |
|`--reference_layer` |The layer in the reference data to be used for cell type annotation if .X is not to be used. |`string` |
|`--reference_obs_target` |The name of the adata obs column in the reference data containing cell type annotations. |`string`, required, example: `"cell_ontology_class"` |
|`--reference_var_gene_names` |The name of the adata var column in the reference data containing gene names; when no gene_name_layer is provided, the var index will be used. |`string` |
|`--reference_var_input` |.var column containing highly variable genes. By default, do not subset genes. |`string` |
|`--unknown_celltype` |Label for unknown cell types. |`string`, default: `"Unknown"` |
### Outputs
Output arguments.
|Name |Description |Attributes |
|:----------|:--------------------------------------------------|:--------------------|
|`--output` |Output h5mu file. |`file`, example: `"output.h5mu"` |
|`--output_compression` | |`string`, example: `"gzip"` |
|`--output_obs_predictions` |In which `.obs` slots to store the predicted information. |`string`, default: `"onclass_pred"` |
|`--output_obs_probability` |In which `.obs` slots to store the probability of the predictions. |`string`, default: `"onclass_prob"` |
### Model arguments
Model arguments
|Name |Description |Attributes |
|:----------|:--------------------------------------------------|:--------------------|
|`--model` |"Pretrained model path without a file extension. If not provided, the model will be trained on the reference data and --reference should be provided. The path namespace should contain: - a .npz or .pkl file - a .data file - a .meta file - a .index file e.g. /path/to/model/pretrained_model_target1 as saved by OnClass." |`string` |
|`--max_iter` |Maximum number of iterations for training the model. |`integer`, default: `30` |
## Authors
* Jakub Majercik [{{< fa brands github >}}](https://github.com/jakubmajercik) [{{< fa brands linkedin >}}](https://linkedin.com/in/jakubmajercik) (author)