You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: day1/day1-1c_visium_HD_segmented.qmd
+67-8Lines changed: 67 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -23,6 +23,7 @@ By the end of this exercise, you will be able to:
23
23
#| warning: false
24
24
#| output: false
25
25
26
+
library(scuttle)
26
27
library(sf)
27
28
library(arrow)
28
29
library(dplyr)
@@ -33,7 +34,8 @@ library(SpatialExperiment)
33
34
library(SpatialFeatureExperiment)
34
35
library(Voyager)
35
36
library(VisiumIO)
36
-
37
+
library(ggspavis)
38
+
library(HDF5Array)
37
39
```
38
40
39
41
Finally, we will work with the **Visium HD segmented output**. This data corresponds to the same sample and region of interest as the binned data from `Exercise 1A`, but instead of being binned, it contains the output of cell segmentation. Since Space Ranger version 4.0, the output of cell segmentation is available as an optional output of the pipeline, and it contains the coordinates of the segmented cells and their boundaries.
@@ -43,7 +45,7 @@ Finally, we will work with the **Visium HD segmented output**. This data corresp
43
45
```{r day1-1c-visium_HD_segmented-3}
44
46
options(timeout = 2000)
45
47
46
-
if (!dir.exists("data/Visium_HD_Human_Colon_Cancer_segmented_outputs/")) {
@@ -64,14 +66,21 @@ if (!dir.exists("data/Visium_HD_Human_Colon_Cancer_segmented_outputs/")) {
64
66
65
67
::: callout-important
66
68
## Exercise 1
67
-
Which class do you think should be used to store the Visium HD segmented data? `SpatialExperiment` as we used in Exercise 1A or `SpatialFeatureExperiment` as we used in Exercise 1B?
69
+
- Which class do you think should be used to store the Visium HD segmented data? `SpatialExperiment` as we used in Exercise 1A or `SpatialFeatureExperiment` as we used in Exercise 1B?
70
+
- What does the data for each segmented cell represent? What is the binning strategy? What are the potential limitations?
71
+
68
72
:::
69
73
70
74
::: {.callout-tip collapse="true"}
71
75
## Answer
72
-
Since segmented data contains cell boundaries, the coordinates of the segmented cells are not limited to the spot coordinates as in the binned data. Therefore, we will use `SpatialFeatureExperiment` to work with the segmented data, as it allows us to store and visualize spatial features such as cell boundaries, as we did for the Xenium data in Exercise 1B.
76
+
- Since segmented data contains cell boundaries, the coordinates of the segmented cells are not limited to the spot coordinates as in the binned data. Therefore, we will use `SpatialFeatureExperiment` to work with the segmented data, as it allows us to store and visualize spatial features such as cell boundaries, as we did for the Xenium data in Exercise 1B.
77
+
`SpatialExperiment` is more suitable for binned data where the coordinates correspond to the spot centroids on a regular grid and there are no additional spatial features to store.
78
+
79
+
- The Visium HD still captures the data on 2x2 µm bins. These are partitioned into larger bins based on the cell they correspond to (see Figure below). The partitioning of aggregated bins mimics single-cell data since the UMI counts are reported per segmented cell basis.
80
+
81
+

73
82
74
-
`SpatialExperiment`is more suitable for binned data where the coordinates correspond to the spot centroids and there are no additional spatial features to store.
83
+
One limitation is that a bin can overlap than one cell, so the assignment of the transcripts captured would be ambiguous. Of course this approach relies also on the quality of the segmentation mask.
When zooming in, we can clearly see that 2 um bins make up each cell in the dataset.
241
+
:::
242
+
243
+
244
+
Save the object
245
+
```{r day1-1c-visium_HD_segmented-12}
202
246
saveHDF5SummarizedExperiment(sfe_seg,
203
247
dir = "results/day1", prefix = "01.1_sfe_visium_", replace = TRUE,
204
248
chunkdim = NULL, level = NULL, as.sparse = NA,
205
249
verbose = NA
206
250
)
207
251
```
208
252
209
-
For the rest of the exercises, we will use this Visium HD segmented object, but feel free to replace by the binned object or the Xenium object to compare the results.
253
+
<!--
254
+
TO DO: Even using this, some out-of-memory components break when re-importing because the original file is unavailable or is tied to some absolute path. E.g.,
255
+
256
+
Error in (function (cond) :
257
+
error in evaluating the argument 'x' in selecting a method for function 'colData': external pointer is not valid
258
+
259
+
Another (better) way to save the object would be to use the alabaster.sfe package, but I failed to install it on my Mac...
For the rest of the exercises, we will use this Visium HD segmented object, but feel free to replace by the binned object (`Exercise 1A`) or the Xenium object (`Exercise 1B`) to compare the results.
Copy file name to clipboardExpand all lines: day1/day1-2a_spotsweeper_qc_segmented.qmd
+44-40Lines changed: 44 additions & 40 deletions
Original file line number
Diff line number
Diff line change
@@ -9,16 +9,17 @@ execute:
9
9
---
10
10
11
11
12
-
## Quality control at bin-level
12
+
## Quality control at cell-level
13
13
14
-
In this second exercise, we will focus on the critical step of quality control (QC) for sequencing-based spatial transcriptomics data. The Visium HD object contains 16 um bins, so each observation is a spatial bin rather than an individual cell. QC helps flag low-quality bins or technical artifacts before downstream analysis.
14
+
In this second exercise, we will focus on the critical step of quality control (QC) for sequencing-based spatial transcriptomics data. We will focus on the segmented Visium HD dataset used in `Exercise 1C`, so each observation is an individual cell with the selected region of interest. QC helps flag low-quality cells or technical artifacts before downstream analysis.
15
15
16
+
*NOTE:* A similar QC can be done per bin or aggregated bin if a binned analysis of Visium HD is performed.
16
17
17
18
## Learning objectives
18
19
19
20
By the end of this exercise, you will be able to:
20
21
21
-
- Calculate per-bin QC metrics.
22
+
- Calculate per-cell QC metrics.
22
23
- Identify local outliers based on various QC metrics.
23
24
- Detect spatial artifacts using `SpotSweeper`.
24
25
- Visualize QC metrics and detected artifacts.
@@ -31,41 +32,45 @@ By the end of this exercise, you will be able to:
31
32
#| output: false
32
33
33
34
library(SpatialExperiment)
35
+
library(SpatialFeatureExperiment)
36
+
library(Voyager)
37
+
library(HDF5Array)
38
+
34
39
library(SpotSweeper)
35
40
library(scuttle)
41
+
library(scrapper)
36
42
library(scater)
37
-
library(ggside)
38
-
library(ggplot2)
39
43
library(escheR)
40
-
library(HDF5Array)
44
+
45
+
library(ggplot2)
46
+
library(ggside)
41
47
library(ggspavis)
42
48
library(patchwork)
43
49
```
44
50
45
51
## Calculate QC metrics
46
52
47
-
We will start by calculating QC metrics that are also commonly used in scRNA-seq data analysis: total UMI counts, number of detected genes, and percentage of UMIs originating from mitochondrial genes. Here these metrics are calculated per Visium HD bin and used to identify low-quality bins.
53
+
We will start by calculating QC metrics that are also commonly used in scRNA-seq data analysis: total UMI counts, number of detected genes, and percentage of UMIs originating from genes of the mitochondrial genome. Here these metrics are calculated per segmented cell and used to identify low-quality cells.
48
54
49
-
We will start by loading our `SpatialExperiment` object, which was saved in the previous exercise, and prepare it for quality control analysis.
55
+
We will start by loading our `SpatialFeatureExperiment` object, which was saved in the previous exercise, and prepare it for quality control analysis.
<!--- NOTE: HDF5-backed saving/loading via saveHDF5SummarizedExperiment() is reliable for assays, but it can leave sf geometries with invalid external pointers after reload (GEOS/s2), which then crash functions like localOutliers() with “external pointer is not valid”.
61
-
Need a solution or don't delete previous object --->
67
+
Need a solution or don't delete previous object
68
+
--->
62
69
63
-
Now, we will calculate per-bin QC metrics using the `scuttle` package. The function `addPerCellQCMetrics` adds QC metrics, including mitochondrial gene percentage, to the `colData` of the `SpatialExperiment` object. The function name says "cell" because it comes from single-cell workflows, but here it operates on spatial bins.
64
-
65
-
Run the following code to compute these metrics and inspect the results.
70
+
Now, we will calculate per-cell QC metrics using the `scrapper` package. The function `quickRnaQc.se` adds QC metrics, including mitochondrial gene percentage, to the `colData` of the `SpatialFeatureExperiment` object. Run the following code to compute these metrics and inspect the results:
@@ -82,15 +87,14 @@ Check which metadata has been added to `colData`. Do you recognize the different
82
87
colData(sfe)
83
88
```
84
89
85
-
- The `sum` column contains the total number of unique molecular identifiers (UMIs) for each bin
86
-
- The `detected` column contains the number of unique genes detected per bin
87
-
- The `subsets_Mito_percent` column contains the percentage of transcripts mapping to mitochondrial genes per bin
90
+
- The `sum` column contains the total number of unique molecular identifiers (UMIs) for each cell
91
+
- The `detected` column contains the number of unique genes detected per cell
92
+
- The `subset.proportion.Mito` column contains the percentage of transcripts mapping to mitochondrial genes per cell
88
93
:::
89
94
90
-
91
95
### Global outlier detection
92
96
93
-
In order to detect low-quality bins and filter them out, QC methods adapted from scRNA-seq analysis can be applied. A simple option is to apply a fixed upper or lower threshold to a metric across all bins and remove the bins that do not pass the filtering criteria.
97
+
In order to detect low-quality cells and filter them out, QC methods adapted from scRNA-seq analysis can be applied. A simple option is to apply a fixed upper or lower threshold to a metric across all cells and remove the cells that do not pass the filtering criteria.
94
98
95
99
To set up these thresholds we can look for information from 10x Genomics and the community, for example https://github.com/10XGenomics/HumanColonCancer_VisiumHD/issues/28. Based on this discussion the authors of the OSTA book wrote (see "Remove bins overlaying empty tissue" section in https://lmweber.org/OSTA/pages/seq-workflow-visium-hd-bin.html#dependencies):
To avoid discarding whole layers of tissue, another approach to identify low-quality bins is to use local outlier detection methods that take into account the spatial context of each bin.
192
+
To avoid discarding whole layers of tissue, another approach to identify low-quality cells is to use local outlier detection methods that take into account the spatial context of each cell.
189
193
190
194
`SpotSweeper` package provides functions to identify common QC metrics (such as the ones we have seen in the previous section: library size, number of detected genes, and mitochondrial percentage) and detect local outliers based on these metrics.
191
195
192
-
We will use `localOutliers` from `SpotSweeper`, which helps identify bins that deviate significantly from their local neighborhood.
196
+
We will use `localOutliers` from `SpotSweeper`, which helps identify cells that deviate significantly from their local neighborhood.
193
197
194
198
195
199
```{r}
196
200
# Identify local outliers based on library size ("sum" of counts).
197
-
# Bins with unusually low library size compared to their neighbors will be flagged.
201
+
# cells with unusually low library size compared to their neighbors will be flagged.
198
202
sfe <- localOutliers(sfe,
199
203
metric = "sum",
200
204
direction = "lower",
201
205
log = TRUE
202
206
)
203
207
204
208
# Identify local outliers based on the number of unique genes detected ("detected").
205
-
# Bins with an unusually low number of detected genes will be flagged.
209
+
# cells with an unusually low number of detected genes will be flagged.
206
210
sfe <- localOutliers(sfe,
207
211
metric = "detected",
208
212
direction = "lower",
209
213
log = TRUE
210
214
)
211
215
212
216
# Identify local outliers based on the mitochondrial gene percentage.
213
-
# Bins with an unusually high mitochondrial percentage will be flagged.
217
+
# cells with an unusually high mitochondrial percentage will be flagged.
214
218
sfe <- localOutliers(sfe,
215
219
metric = "subsets_Mito_percent",
216
220
direction = "higher",
217
221
log = FALSE
218
222
)
219
223
220
224
# Combine all individual outlier flags into a single "local_outliers" column.
221
-
# A bin is considered a local outlier if it is flagged by any of the above metrics.
225
+
# A cell is considered a local outlier if it is flagged by any of the above metrics.
How many bins were identified as local outliers based on the combined criteria?
233
+
How many cells were identified as local outliers based on the combined criteria?
230
234
231
235
As we have seen before, visualizing the QC metrics is crucial for understanding the quality of your spatial transcriptomics data and for deciding on appropriate filtering strategies, so look at their spatial localization using the `plotObsQC` function.
232
236
@@ -237,7 +241,7 @@ What do you think?
237
241
:::{.callout-tip collapse="true"}
238
242
### Answer
239
243
```{r}
240
-
# Count the number of bins flagged as local outliers.
244
+
# Count the number of cells flagged as local outliers.
Visualize the bins that will be discarded based on the combined criteria. How many bins will be removed?
330
+
Visualize the cells that will be discarded based on the combined criteria. How many cells will be removed?
327
331
:::
328
332
329
333
:::{.callout-tip collapse="true"}
330
334
### Answer
331
335
```{r}
332
-
# check how many bins will be discarded
336
+
# check how many cells will be discarded
333
337
table(sfe$discard)
334
338
335
-
# have a final look at the bins that we will discard
339
+
# have a final look at the cells that we will discard
336
340
plotSpatialFeature(sfe, colGeometryName = "cellseg", features = "discard")
337
341
```
338
342
:::
@@ -388,10 +392,10 @@ plotQCmetrics(sfe,
388
392
:::
389
393
390
394
## Filtering
391
-
We decide to filter out the low-quality bins from the `SpatialExperiment` object based on the combined criteria of global thresholds and local outliers. Additionally, we will remove any features (genes) that have zero counts across all remaining bins.
395
+
We decide to filter out the low-quality cells from the `SpatialExperiment` object based on the combined criteria of global thresholds and local outliers. Additionally, we will remove any features (genes) that have zero counts across all remaining cells.
392
396
393
397
```{r}
394
-
# remove the bins considered of low quality based on both global and local metrics
398
+
# remove the cells considered of low quality based on both global and local metrics
0 commit comments