Skip to content

Commit 09f7f08

Browse files
pablo-garebezzi
andauthored
[docs] Add metrics article for LTS 2024-12-15 (#1233)
* add article * change heading * Editorial Co-authored-by: Emanuele Bezzi <[email protected]> * Linter + fixes * Newline * Figure numbers --------- Co-authored-by: Emanuele Bezzi <[email protected]>
1 parent 3a8c79a commit 09f7f08

7 files changed

+350
-4
lines changed

docs/_static/css/custom.css

+23-4
Original file line numberDiff line numberDiff line change
@@ -529,10 +529,29 @@ body {
529529
height: 16px !important;
530530
}
531531

532-
table.custom-table,
532+
table.custom-table {
533+
border-collapse: collapse;
534+
min-width: 400px;
535+
}
536+
533537
table.custom-table td,
534538
table.custom-table th {
539+
padding: 10px;
540+
border: 1px solid #dddddd;
535541
vertical-align: middle;
536-
border: 1px solid #ddd;
537-
padding: 5px;
538-
}
542+
}
543+
544+
table.custom-table thead tr{
545+
background-color: #3170f4;
546+
color: #ffffff;
547+
text-align: center;
548+
}
549+
550+
table.custom-table tbody td {
551+
text-align: left;
552+
}
553+
554+
table.custom-table tbody tr:nth-of-type(even) {
555+
background-color: #f6f8fA;
556+
}
557+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,327 @@
1+
# Benchmarks of single-cell Census models
2+
3+
*Published:* *July 10th, 2024*
4+
5+
*By:* *[Emanuele Bezzi](mailto:[email protected]), [Pablo Garcia-Nieto](mailto:[email protected])*
6+
7+
In 2023, the Census team released a series of cells embeddings (available at the [Census Model page](https://cellxgene.cziscience.com/census-models)) compatible with the [Census LTS version `census_version="2023-12-15"`](https://chanzuckerberg.github.io/cellxgene-census/cellxgene_census_docsite_data_release_info.html#lts-2023-12-15), so that users can access and download for any slice of Census data.
8+
9+
These embeddings were generated via different large-scale models; in this article we present the results of light benchmarking of them. We hope that these benchmarks provide an initial picture to users on, 1) the strength of biological signal captured by these embeddings and, 2) the level of batch correction they exert.
10+
11+
We advise our users to consider these benchmarks as first-pass information and we recommend further benchmarking for a more comprehensive view of the embeddings and for task-oriented applications.
12+
13+
The benchmarks were run on the following embeddings:
14+
15+
- scVI latent spaces from a model trained on all Census data.
16+
- Fine-tuned Geneformer.
17+
- Zero-shot scGPT.
18+
- Zero-shot Universal Cell Embeddings (UCE).
19+
20+
For more details on each model please see the [Census Model page](https://cellxgene.cziscience.com/census-models).
21+
22+
## Accessing the embeddings included in the benchmark
23+
24+
Please the [Census Model page](https://cellxgene.cziscience.com/census-models) for full details. Shortly, you can see the embeddings available for the Census LTS version `census_version="2023-12-15"` using the Census API as follows.
25+
26+
```python
27+
import cellxgene_census.experimental.get_all_available_embeddings
28+
cellxgene_census.experimental.get_all_available_embeddings(census_version="2023-12-15")
29+
```
30+
31+
With the exception of NMF factors, all other human embeddings were included in the benchmarks below. If you would want to access the embeddings for any slice of data you can utilize the parameter `obs_embeddings` from the`get_anndata()` method of the Census API, for example:
32+
33+
```python
34+
import cellxgene_census
35+
census = cellxgene_census.open_soma(census_version="2023-12-15")
36+
adata = cellxgene_census.get_anndata(
37+
census,
38+
organism = "homo_sapiens",
39+
measurement_name = "RNA",
40+
obs_value_filter = "tissue_general == 'central nervous system'",
41+
obs_embeddings = ["scvi"]
42+
)
43+
```
44+
45+
## Benchmarks of Census Embeddings
46+
47+
### About the benchmarks
48+
49+
We executed a series of benchmarks falling into two general types: one to assess the level of biological signal contained in the embeddings, and the second to measure the level of correction for batch effects. In general, the utility of the embeddings increases as a function of these two set of benchmarks.
50+
51+
For each of the type, the benchmarks can be further subdivided by their "mode". A series of metrics assess the embedding space, and the others assess the capacity of the embeddings to predict labels.
52+
53+
The table below shows a breakdown of the benchmarks we used in this report.
54+
55+
<table class="custom-table">
56+
<thead>
57+
<tr>
58+
<th>Type</th>
59+
<th>Mode</th>
60+
<th>Metric</th>
61+
<th>Description</th>
62+
</tr>
63+
</thead>
64+
<tbody>
65+
<tr>
66+
<td rowspan="6">Bio-conservation</td>
67+
<td rowspan="3">Embedding<br>Space</td>
68+
<td><code>leiden_nmi</code></td>
69+
<td>Normalized Mutual Information of biological labels and leiden clusters. Described in <a href="https://scib-metrics.readthedocs.io/en/stable/generated/scib_metrics.nmi_ari_cluster_labels_leiden.html">Luecken et al.</a> and implemented in <a href="https://scib-metrics.readthedocs.io/en/stable/generated/scib_metrics.nmi_ari_cluster_labels_leiden.html">scib-metrics.</a></td>
70+
</tr>
71+
<tr>
72+
<td><code>leiden_ari</code></td>
73+
<td>Adjusted Rand Index of biological labels and leiden clusters. Described in <a href="https://scib-metrics.readthedocs.io/en/stable/generated/scib_metrics.nmi_ari_cluster_labels_leiden.html">Luecken et al.</a> and implemented in <a href="https://scib-metrics.readthedocs.io/en/stable/generated/scib_metrics.nmi_ari_cluster_labels_leiden.html">scib-metrics.</a></td>
74+
</tr>
75+
<tr>
76+
<td><code>silhouette_label</code></td>
77+
<td>Silhouette score with respect to biological labels. Described in <a href="https://scib-metrics.readthedocs.io/en/stable/generated/scib_metrics.nmi_ari_cluster_labels_leiden.html">Luecken et al.</a> and implemented in <a href="https://scib-metrics.readthedocs.io/en/stable/generated/scib_metrics.silhouette_label.html">scib-metrics.</a></td>
78+
</tr>
79+
<tr>
80+
<td rowspan="3">Label<br>Classifier</td>
81+
<td><code>classifier_svm</code></td>
82+
<td>Accuracy of biological label prediction using a SVM (60/40 train/test split). Implemented <a href="https://github.com/chanzuckerberg/cellxgene-census/blob/f44637ba33567400820407f4f7b9984e52966156/tools/models/metrics/run-scib.py#L36">here</a>.</td>
83+
</tr>
84+
<tr>
85+
<td><code>classifier_forest</code></td>
86+
<td>Accuracy of biological label prediction using a Random Forest classifier (60/40 train/test split). Implemented <a href="https://github.com/chanzuckerberg/cellxgene-census/blob/f44637ba33567400820407f4f7b9984e52966156/tools/models/metrics/run-scib.py#L39">here</a>.</td>
87+
</tr>
88+
<tr>
89+
<td><code>classifier_lr</code></td>
90+
<td>Accuracy of biological label prediction using a Logistic regression classifier (60/40 train/test split). Implemented <a href="https://github.com/chanzuckerberg/cellxgene-census/blob/f44637ba33567400820407f4f7b9984e52966156/tools/models/metrics/run-scib.py#L39">here</a>.</td>
91+
</tr>
92+
<tr>
93+
<td rowspan="5">Batch-correction</td>
94+
<td rowspan="2">Embedding<br>Space</td>
95+
<td><code>silhouette_batch</code></td>
96+
<td>1- silhouette score with respect to biological labels. Described in <a href="https://scib-metrics.readthedocs.io/en/stable/generated/scib_metrics.nmi_ari_cluster_labels_leiden.html">Luecken et al.</a> and implemented in <a href="https://scib-metrics.readthedocs.io/en/stable/generated/scib_metrics.nmi_ari_cluster_labels_leiden.html">scib-metrics.</a></td>
97+
</tr>
98+
<tr>
99+
<td><code>entropy</code></td>
100+
<td>Average of neighborhood entropy of batch labels per cell. Implemented <a href="https://github.com/chanzuckerberg/cellxgene-census/blob/f44637ba33567400820407f4f7b9984e52966156/tools/models/metrics/run-scib.py#L86">here</a>.</td>
101+
</tr>
102+
<tr>
103+
<td rowspan="3">Label<br>Classifier</td>
104+
<td><code>classifier_svm</code></td>
105+
<td>1 - accuracy of batch label prediction using a SVM (60/40 train/test split). Implemented <a href="https://github.com/chanzuckerberg/cellxgene-census/blob/f44637ba33567400820407f4f7b9984e52966156/tools/models/metrics/run-scib.py#L45">here</a>.</td>
106+
</tr>
107+
<tr>
108+
<td><code>classifier_forest</code></td>
109+
<td>1 - accuracy of batch label prediction using a Random Forest classifier (60/40 train/test split). Implemented <a href="https://github.com/chanzuckerberg/cellxgene-census/blob/f44637ba33567400820407f4f7b9984e52966156/tools/models/metrics/run-scib.py#L48">here</a>.</td>
110+
</tr>
111+
<tr>
112+
<td><code>classifier_lr</code></td>
113+
<td>1 - accuracy of batch label prediction using a Logistic regression classifier (60/40 train/test split). Implemented <a href="https://github.com/chanzuckerberg/cellxgene-census/blob/f44637ba33567400820407f4f7b9984e52966156/tools/models/metrics/run-scib.py#L42">here</a>.</td>
114+
</tr>
115+
</tbody>
116+
</table>
117+
118+
**Table 1:** List of benchmarks.
119+
120+
### Benchmark results
121+
122+
As reminder the benchmarks were run on the following embeddings:
123+
124+
- scVI latent spaces from a model trained on all Census data.
125+
- Fine-tuned Geneformer.
126+
- Zero-shot scGPT.
127+
- Zero-shot Universal Cell Embeddings (UCE).
128+
129+
#### Summary
130+
131+
The following are averages for all the metrics shown in the following sections.
132+
133+
```{figure} ./20240710_metrics_0_summary.png
134+
:alt: Bio-conservation single-cell Census benchmark
135+
:align: center
136+
:figwidth: 90%
137+
138+
**Figure 1. Summary of all benchmarks.** Numerical averages across the metric types and modes from all bio- and batch-labels across the tissues in this report.
139+
```
140+
141+
#### Bio-conservation
142+
143+
The bio-conservation metrics were run the in following biological labels in a Census cells from Adipose Tissue and Spinal Cord:
144+
145+
- Cell subclass: a higher definition of a cell type with maximum of 73 unique labels, as defined on the CELLxGENE collection page.
146+
- Cell class: an even higher definition of a cell type with a maximum of 22 unique labels, also defined on the CELLxGENE collection page.
147+
148+
```{figure} ./20240710_metrics_1_bio_emb.png
149+
:alt: Bio-conservation single-cell Census benchmark
150+
:align: center
151+
:figwidth: 90%
152+
153+
**Figure 2. Bio-conservation metrics on the embedding space.** Higher values signify better performance, max value for all metrics is 1.
154+
```
155+
156+
```{figure} ./20240710_metrics_2_bio_classifier.png
157+
:alt: Bio-conservation single-cell Census benchmark
158+
:align: center
159+
:figwidth: 90%
160+
161+
**Figure 3. Bio-conservation metrics based on label classifiers.** Values represent label prediction accuracy. Higher values signify better performance, max value for all metrics is 1.
162+
```
163+
164+
#### Batch-correction
165+
166+
The batch-correction metrics were run the in following batch labels in a Census cells from Adipose Tissue and Spinal Cord:
167+
168+
- Assay: the sequencing technology.
169+
- Dataset: the dataset from which the cell originated from.
170+
- Suspension type: cell vs nucleus.
171+
- Batch: the concatenation of values for all of the above.
172+
173+
```{figure} ./20240710_metrics_3_batch_emb.png
174+
:alt: Batch-correction single-cell Census benchmark
175+
:align: center
176+
:figwidth: 90%
177+
178+
**Figure 4. Batch-correction metrics on the embedding space.** Higher values signify better performance, max value for `silhouette_batch` is 1, `entropy` values should only be compared within the tissue/label combination and not across.
179+
```
180+
181+
```{figure} ./20240710_metrics_4_batch_classifier.png
182+
:alt: Batch-correction single-cell Census benchmark
183+
:align: center
184+
:figwidth: 90%
185+
186+
**Figure 5. Batch-correction metrics based on label classifiers.** Values represent **1 - label prediction accuracy**. In theory higher values signify better performance indicating that prediction of batch labels is not accurate. However foundation models may be designed to learn *all* information including technical variation, please refer to the original publications of the models to learn more about them.
187+
```
188+
189+
## Source data
190+
191+
All data was obtained from the Census API, to fetch the data used in this report you can execute the following in Python. To get the cell subclass and cell class please refer to the [CellxGene Ontology Guide API](https://github.com/chanzuckerberg/cellxgene-ontology-guide/tree/main).
192+
193+
```python
194+
import cellxgene_census
195+
196+
val_filters = {
197+
"adipose": "tissue_general == 'adipose tissue' and is_primary_data == True",
198+
"spinal": "tissue_general == 'spinal cord' and is_primary_data == True",
199+
}
200+
201+
embedding_names = ["geneformer", "scgpt", "scvi", "uce"]
202+
embedding_names = ["scvi"]
203+
column_names = {
204+
"obs": ["cell_type_ontology_term_id", "cell_type", "assay", "suspension_type", "dataset_id", "soma_joinid"]
205+
}
206+
207+
census = cellxgene_census.open_soma(census_version="2023-12-15")
208+
209+
adatas = []
210+
for tissue in val_filters:
211+
adatas.append(
212+
cellxgene_census.get_anndata(
213+
census,
214+
organism="homo_sapiens",
215+
measurement_name="RNA",
216+
obs_value_filter= val_filters[tissue],
217+
obs_embeddings=embedding_names,
218+
column_names=column_names,
219+
)
220+
)
221+
```
222+
223+
### Batch label counts
224+
225+
The following shows the batch label counts per tissue:
226+
227+
#### Adipose tissue
228+
229+
<table class="custom-table">
230+
<thead>
231+
<tr>
232+
<th>Type</th>
233+
<th>Label</th>
234+
<th>Count</th>
235+
</tr>
236+
</thead>
237+
<tbody>
238+
<tr>
239+
<td rowspan="4">Assay</td>
240+
<td>10x 3' v3</td>
241+
<td>91947</td>
242+
</tr>
243+
<tr>
244+
<td>10x 5' transcription profiling</td>
245+
<td>2121</td>
246+
</tr>
247+
<tr>
248+
<td>microwell-seq</td>
249+
<td>5916</td>
250+
</tr>
251+
<tr>
252+
<td>Smart-seq2</td>
253+
<td>651</td>
254+
</tr>
255+
<tr>
256+
<td rowspan="2">Suspension Type</td>
257+
<td>nucleus</td>
258+
<td>72335</td>
259+
</tr>
260+
<tr>
261+
<td>cell</td>
262+
<td>23756</td>
263+
</tr>
264+
<tr>
265+
<td rowspan="4">Dataset</td>
266+
<td>9d8e5dca-03a3-457d-b7fb-844c75735c83</td>
267+
<td>72335</td>
268+
</tr>
269+
<tr>
270+
<td>53d208b0-2cfd-4366-9866-c3c6114081bc</td>
271+
<td>20263</td>
272+
</tr>
273+
<tr>
274+
<td>5af90777-6760-4003-9dba-8f945fec6fdf</td>
275+
<td>2121</td>
276+
</tr>
277+
<tr>
278+
<td>2adb1f8a-a6b1-4909-8ee8-484814e2d4bf</td>
279+
<td>1372</td>
280+
</tr>
281+
</tbody>
282+
</table>
283+
284+
#### Spinal cord
285+
286+
<table class="custom-table">
287+
<thead>
288+
<tr>
289+
<th>Type</th>
290+
<th>Label</th>
291+
<th>Count</th>
292+
</tr>
293+
</thead>
294+
<tbody>
295+
<tr>
296+
<td rowspan="2">Assay</td>
297+
<td>10x 3' v3</td>
298+
<td>43840</td>
299+
</tr>
300+
<tr>
301+
<td>microwell-seq</td>
302+
<td>5916</td>
303+
</tr>
304+
<tr>
305+
<td rowspan="2">Suspension Type</td>
306+
<td>nucleus</td>
307+
<td>43840</td>
308+
</tr>
309+
<tr>
310+
<td>cell</td>
311+
<td>5916</td>
312+
</tr>
313+
<tr>
314+
<td rowspan="3">Dataset</td>
315+
<td>090da8ea-46e8-40df-bffc-1f78e1538d27</td>
316+
<td>24190</td>
317+
</tr>
318+
<tr>
319+
<td>c05e6940-729c-47bd-a2a6-6ce3730c4919</td>
320+
<td>19650</td>
321+
</tr>
322+
<tr>
323+
<td>2adb1f8a-a6b1-4909-8ee8-484814e2d4bf</td>
324+
<td>5916</td>
325+
</tr>
326+
</tbody>
327+
</table>
Loading
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)