Skip to content

Commit 0ca11e8

Browse files
committed
A few more chnages to day1-1c and started to work on day1-2
1 parent ea0dce3 commit 0ca11e8

3 files changed

Lines changed: 111 additions & 48 deletions

File tree

31 KB
Loading

day1/day1-1c_visium_HD_segmented.qmd

Lines changed: 67 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ By the end of this exercise, you will be able to:
2323
#| warning: false
2424
#| output: false
2525
26+
library(scuttle)
2627
library(sf)
2728
library(arrow)
2829
library(dplyr)
@@ -33,7 +34,8 @@ library(SpatialExperiment)
3334
library(SpatialFeatureExperiment)
3435
library(Voyager)
3536
library(VisiumIO)
36-
37+
library(ggspavis)
38+
library(HDF5Array)
3739
```
3840

3941
Finally, we will work with the **Visium HD segmented output**. This data corresponds to the same sample and region of interest as the binned data from `Exercise 1A`, but instead of being binned, it contains the output of cell segmentation. Since Space Ranger version 4.0, the output of cell segmentation is available as an optional output of the pipeline, and it contains the coordinates of the segmented cells and their boundaries.
@@ -43,7 +45,7 @@ Finally, we will work with the **Visium HD segmented output**. This data corresp
4345
```{r day1-1c-visium_HD_segmented-3}
4446
options(timeout = 2000)
4547
46-
if (!dir.exists("data/Visium_HD_Human_Colon_Cancer_segmented_outputs/")) {
48+
if (!dir.exists("data/segmented_outputs/")) {
4749
dir.create("data", showWarnings = FALSE)
4850
download.file(
4951
url = "https://cf.10xgenomics.com/samples/spatial-exp/4.0.1/Visium_HD_Human_Colon_Cancer/Visium_HD_Human_Colon_Cancer_segmented_outputs.tar.gz",
@@ -64,14 +66,21 @@ if (!dir.exists("data/Visium_HD_Human_Colon_Cancer_segmented_outputs/")) {
6466

6567
::: callout-important
6668
## Exercise 1
67-
Which class do you think should be used to store the Visium HD segmented data? `SpatialExperiment` as we used in Exercise 1A or `SpatialFeatureExperiment` as we used in Exercise 1B?
69+
- Which class do you think should be used to store the Visium HD segmented data? `SpatialExperiment` as we used in Exercise 1A or `SpatialFeatureExperiment` as we used in Exercise 1B?
70+
- What does the data for each segmented cell represent? What is the binning strategy? What are the potential limitations?
71+
6872
:::
6973

7074
::: {.callout-tip collapse="true"}
7175
## Answer
72-
Since segmented data contains cell boundaries, the coordinates of the segmented cells are not limited to the spot coordinates as in the binned data. Therefore, we will use `SpatialFeatureExperiment` to work with the segmented data, as it allows us to store and visualize spatial features such as cell boundaries, as we did for the Xenium data in Exercise 1B.
76+
- Since segmented data contains cell boundaries, the coordinates of the segmented cells are not limited to the spot coordinates as in the binned data. Therefore, we will use `SpatialFeatureExperiment` to work with the segmented data, as it allows us to store and visualize spatial features such as cell boundaries, as we did for the Xenium data in Exercise 1B.
77+
`SpatialExperiment` is more suitable for binned data where the coordinates correspond to the spot centroids on a regular grid and there are no additional spatial features to store.
78+
79+
- The Visium HD still captures the data on 2x2 µm bins. These are partitioned into larger bins based on the cell they correspond to (see Figure below). The partitioning of aggregated bins mimics single-cell data since the UMI counts are reported per segmented cell basis.
80+
81+
![Binning approach in the Visium HD segmented output data (segmented nuclei here)](../assets/images/VisiumHD_nuclei_segmentation_binning_figure1.png)
7382

74-
`SpatialExperiment` is more suitable for binned data where the coordinates correspond to the spot centroids and there are no additional spatial features to store.
83+
One limitation is that a bin can overlap than one cell, so the assignment of the transcripts captured would be ambiguous. Of course this approach relies also on the quality of the segmentation mask.
7584
:::
7685

7786

@@ -123,7 +132,7 @@ spe <- loadHDF5SummarizedExperiment(dir="results/day1/", prefix="01.1_spe_")
123132
```
124133

125134
::: callout-important
126-
## Exercise 1
135+
## Exercise 2
127136

128137
- 1. Compare the number of features and spots in the segmented and binned objects. What do you observe?
129138
- 2. Compare the `colData` of the two objects. What are the differences in the metadata columns? What do they represent?
@@ -196,15 +205,65 @@ plotSpatialFeature(sfe_seg,
196205
features="PIGR",
197206
exprs_values = "counts")
198207
```
208+
:::
209+
210+
::: callout-important
211+
## Exercise 3
212+
Plot the cell segmentation mask of very small region of interest on the slide, for example:
199213

200-
Save the object
201214
```{r day1-1c-visium_HD_segmented-10}
215+
roi_zoomed <- c(
216+
xmin = 54500,
217+
xmax = 55000,
218+
ymin = 12500,
219+
ymax = 13000
220+
)
221+
```
222+
What do you observe?
223+
:::
224+
225+
::: {.callout-tip collapse="true"}
226+
## Answer
227+
228+
```{r day1-1c-visium_HD_segmented-11}
229+
sfe_zoomed <- sfe_seg[, spatialCoords(sfe_seg)[, 1] >= roi_zoomed[["xmin"]] &
230+
spatialCoords(sfe_seg)[, 1] <= roi_zoomed[["xmax"]] &
231+
spatialCoords(sfe_seg)[, 2] >= roi_zoomed[["ymin"]] &
232+
spatialCoords(sfe_seg)[, 2] <= roi_zoomed[["ymax"]]]
233+
234+
plot(st_geometry(colGeometries(sfe_zoomed)[["cellSeg"]]),
235+
col=rep(colors(), 2),
236+
lty=0)
237+
rm(sfe_zoomed)
238+
```
239+
240+
When zooming in, we can clearly see that 2 um bins make up each cell in the dataset.
241+
:::
242+
243+
244+
Save the object
245+
```{r day1-1c-visium_HD_segmented-12}
202246
saveHDF5SummarizedExperiment(sfe_seg,
203247
dir = "results/day1", prefix = "01.1_sfe_visium_", replace = TRUE,
204248
chunkdim = NULL, level = NULL, as.sparse = NA,
205249
verbose = NA
206250
)
207251
```
208252

209-
For the rest of the exercises, we will use this Visium HD segmented object, but feel free to replace by the binned object or the Xenium object to compare the results.
253+
<!--
254+
TO DO: Even using this, some out-of-memory components break when re-importing because the original file is unavailable or is tied to some absolute path. E.g.,
255+
256+
Error in (function (cond) :
257+
error in evaluating the argument 'x' in selecting a method for function 'colData': external pointer is not valid
258+
259+
Another (better) way to save the object would be to use the alabaster.sfe package, but I failed to install it on my Mac...
260+
261+
```{r}
262+
fsave <- file.path("results/day1", "01.1_sfe_visium")
263+
alabaster.sfe::saveObject(sfe_seg, fsave)
264+
dir_tree(fsave)
265+
```
266+
-->
267+
268+
For the rest of the exercises, we will use this Visium HD segmented object, but feel free to replace by the binned object (`Exercise 1A`) or the Xenium object (`Exercise 1B`) to compare the results.
210269

day1/day1-2a_spotsweeper_qc_segmented.qmd

Lines changed: 44 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,17 @@ execute:
99
---
1010

1111

12-
## Quality control at bin-level
12+
## Quality control at cell-level
1313

14-
In this second exercise, we will focus on the critical step of quality control (QC) for sequencing-based spatial transcriptomics data. The Visium HD object contains 16 um bins, so each observation is a spatial bin rather than an individual cell. QC helps flag low-quality bins or technical artifacts before downstream analysis.
14+
In this second exercise, we will focus on the critical step of quality control (QC) for sequencing-based spatial transcriptomics data. We will focus on the segmented Visium HD dataset used in `Exercise 1C`, so each observation is an individual cell with the selected region of interest. QC helps flag low-quality cells or technical artifacts before downstream analysis.
1515

16+
*NOTE:* A similar QC can be done per bin or aggregated bin if a binned analysis of Visium HD is performed.
1617

1718
## Learning objectives
1819

1920
By the end of this exercise, you will be able to:
2021

21-
- Calculate per-bin QC metrics.
22+
- Calculate per-cell QC metrics.
2223
- Identify local outliers based on various QC metrics.
2324
- Detect spatial artifacts using `SpotSweeper`.
2425
- Visualize QC metrics and detected artifacts.
@@ -31,41 +32,45 @@ By the end of this exercise, you will be able to:
3132
#| output: false
3233
3334
library(SpatialExperiment)
35+
library(SpatialFeatureExperiment)
36+
library(Voyager)
37+
library(HDF5Array)
38+
3439
library(SpotSweeper)
3540
library(scuttle)
41+
library(scrapper)
3642
library(scater)
37-
library(ggside)
38-
library(ggplot2)
3943
library(escheR)
40-
library(HDF5Array)
44+
45+
library(ggplot2)
46+
library(ggside)
4147
library(ggspavis)
4248
library(patchwork)
4349
```
4450

4551
## Calculate QC metrics
4652

47-
We will start by calculating QC metrics that are also commonly used in scRNA-seq data analysis: total UMI counts, number of detected genes, and percentage of UMIs originating from mitochondrial genes. Here these metrics are calculated per Visium HD bin and used to identify low-quality bins.
53+
We will start by calculating QC metrics that are also commonly used in scRNA-seq data analysis: total UMI counts, number of detected genes, and percentage of UMIs originating from genes of the mitochondrial genome. Here these metrics are calculated per segmented cell and used to identify low-quality cells.
4854

49-
We will start by loading our `SpatialExperiment` object, which was saved in the previous exercise, and prepare it for quality control analysis.
55+
We will start by loading our `SpatialFeatureExperiment` object, which was saved in the previous exercise, and prepare it for quality control analysis.
5056

5157

5258
```{r}
53-
# Load the SpatialExperiment object
54-
sfe <- loadHDF5SummarizedExperiment(dir="results/day1/", prefix="01.1_sfe_seg_")
59+
# Load the SpatialExperiment object if not in the R session already
60+
sfe <- loadHDF5SummarizedExperiment(dir="results/day1/", prefix="01.1_sfe_visium_")
5561
5662
# Identify mitochondrial genes (genes with symbol starting with "MT-")
5763
is.mito <- rownames(sfe)[grepl("^MT-", rownames(sfe))]
5864
```
5965

6066
<!--- NOTE: HDF5-backed saving/loading via saveHDF5SummarizedExperiment() is reliable for assays, but it can leave sf geometries with invalid external pointers after reload (GEOS/s2), which then crash functions like localOutliers() with “external pointer is not valid”.
61-
Need a solution or don't delete previous object --->
67+
Need a solution or don't delete previous object
68+
--->
6269

63-
Now, we will calculate per-bin QC metrics using the `scuttle` package. The function `addPerCellQCMetrics` adds QC metrics, including mitochondrial gene percentage, to the `colData` of the `SpatialExperiment` object. The function name says "cell" because it comes from single-cell workflows, but here it operates on spatial bins.
64-
65-
Run the following code to compute these metrics and inspect the results.
70+
Now, we will calculate per-cell QC metrics using the `scrapper` package. The function `quickRnaQc.se` adds QC metrics, including mitochondrial gene percentage, to the `colData` of the `SpatialFeatureExperiment` object. Run the following code to compute these metrics and inspect the results:
6671

6772
```{r}
68-
sfe <- addPerCellQCMetrics(sfe, subsets = list(Mito = is.mito))
73+
sfe <- quickRnaQc.se(sfe, subsets = list(Mito = is.mito))
6974
```
7075

7176
::: callout-important
@@ -82,15 +87,14 @@ Check which metadata has been added to `colData`. Do you recognize the different
8287
colData(sfe)
8388
```
8489

85-
- The `sum` column contains the total number of unique molecular identifiers (UMIs) for each bin
86-
- The `detected` column contains the number of unique genes detected per bin
87-
- The `subsets_Mito_percent` column contains the percentage of transcripts mapping to mitochondrial genes per bin
90+
- The `sum` column contains the total number of unique molecular identifiers (UMIs) for each cell
91+
- The `detected` column contains the number of unique genes detected per cell
92+
- The `subset.proportion.Mito` column contains the percentage of transcripts mapping to mitochondrial genes per cell
8893
:::
8994

90-
9195
### Global outlier detection
9296

93-
In order to detect low-quality bins and filter them out, QC methods adapted from scRNA-seq analysis can be applied. A simple option is to apply a fixed upper or lower threshold to a metric across all bins and remove the bins that do not pass the filtering criteria.
97+
In order to detect low-quality cells and filter them out, QC methods adapted from scRNA-seq analysis can be applied. A simple option is to apply a fixed upper or lower threshold to a metric across all cells and remove the cells that do not pass the filtering criteria.
9498

9599
To set up these thresholds we can look for information from 10x Genomics and the community, for example https://github.com/10XGenomics/HumanColonCancer_VisiumHD/issues/28. Based on this discussion the authors of the OSTA book wrote (see "Remove bins overlaying empty tissue" section in https://lmweber.org/OSTA/pages/seq-workflow-visium-hd-bin.html#dependencies):
96100

@@ -127,15 +131,15 @@ p2 <- ggcells(sfe, aes(x=detected, y=subsets_Mito_percent)) +
127131
p1 + p2
128132
```
129133

130-
And further check how many bins would be removed with the given thresholds:
134+
And further check how many cells would be removed with the given thresholds:
131135

132136
```{r}
133-
# Flag bins based on fixed thresholds
137+
# Flag cells based on fixed thresholds
134138
sfe$qc_lib_size <- sfe$sum < 400
135139
sfe$qc_detected <- sfe$detected < 300
136140
sfe$qc_mito_prop <- sfe$subsets_Mito_percent > 20
137141
138-
# Tabulate number of bins flagged by each
142+
# Tabulate number of cells flagged by each
139143
qc <- grep("^qc", names(colData(sfe)))
140144
sapply(colData(sfe)[qc], table)
141145
```
@@ -144,7 +148,7 @@ sapply(colData(sfe)[qc], table)
144148
## Exercise 2
145149
What do you think about this initial filtering?
146150

147-
Have a look at the mitochondrial percentage values on the tissue section using `plotVisium()`. Which bins are more likely to be excluded?
151+
Have a look at the mitochondrial percentage values on the tissue section using `plotVisium()`. Which cells are more likely to be excluded?
148152
:::
149153

150154
:::{.callout-tip collapse="true"}
@@ -165,7 +169,7 @@ plot_layout(nrow=1) &
165169
Very importantly, we want to have a look into the spatial pattern of using a fixed threshold for QC metrics :
166170

167171
```{r}
168-
# check spatial pattern of discarded bins
172+
# check spatial pattern of discarded cells
169173
p1 <- plotObsQC(sfe,
170174
plot_type="spot", annotate="qc_lib_size") +
171175
ggtitle("Library size (< 400 UMI)")
@@ -185,48 +189,48 @@ gridExtra::grid.arrange(p1, p2, p3, ncol=2)
185189

186190
### Local Outlier Detection
187191

188-
To avoid discarding whole layers of tissue, another approach to identify low-quality bins is to use local outlier detection methods that take into account the spatial context of each bin.
192+
To avoid discarding whole layers of tissue, another approach to identify low-quality cells is to use local outlier detection methods that take into account the spatial context of each cell.
189193

190194
`SpotSweeper` package provides functions to identify common QC metrics (such as the ones we have seen in the previous section: library size, number of detected genes, and mitochondrial percentage) and detect local outliers based on these metrics.
191195

192-
We will use `localOutliers` from `SpotSweeper`, which helps identify bins that deviate significantly from their local neighborhood.
196+
We will use `localOutliers` from `SpotSweeper`, which helps identify cells that deviate significantly from their local neighborhood.
193197

194198

195199
```{r}
196200
# Identify local outliers based on library size ("sum" of counts).
197-
# Bins with unusually low library size compared to their neighbors will be flagged.
201+
# cells with unusually low library size compared to their neighbors will be flagged.
198202
sfe <- localOutliers(sfe,
199203
metric = "sum",
200204
direction = "lower",
201205
log = TRUE
202206
)
203207
204208
# Identify local outliers based on the number of unique genes detected ("detected").
205-
# Bins with an unusually low number of detected genes will be flagged.
209+
# cells with an unusually low number of detected genes will be flagged.
206210
sfe <- localOutliers(sfe,
207211
metric = "detected",
208212
direction = "lower",
209213
log = TRUE
210214
)
211215
212216
# Identify local outliers based on the mitochondrial gene percentage.
213-
# Bins with an unusually high mitochondrial percentage will be flagged.
217+
# cells with an unusually high mitochondrial percentage will be flagged.
214218
sfe <- localOutliers(sfe,
215219
metric = "subsets_Mito_percent",
216220
direction = "higher",
217221
log = FALSE
218222
)
219223
220224
# Combine all individual outlier flags into a single "local_outliers" column.
221-
# A bin is considered a local outlier if it is flagged by any of the above metrics.
225+
# A cell is considered a local outlier if it is flagged by any of the above metrics.
222226
sfe$local_outliers <- as.logical(sfe$sum_outliers) |
223227
as.logical(sfe$detected_outliers) |
224228
as.logical(sfe$subsets_Mito_percent_outliers)
225229
```
226230

227231
::: callout-important
228232
## Exercise 3
229-
How many bins were identified as local outliers based on the combined criteria?
233+
How many cells were identified as local outliers based on the combined criteria?
230234

231235
As we have seen before, visualizing the QC metrics is crucial for understanding the quality of your spatial transcriptomics data and for deciding on appropriate filtering strategies, so look at their spatial localization using the `plotObsQC` function.
232236

@@ -237,7 +241,7 @@ What do you think?
237241
:::{.callout-tip collapse="true"}
238242
### Answer
239243
```{r}
240-
# Count the number of bins flagged as local outliers.
244+
# Count the number of cells flagged as local outliers.
241245
table(sfe$local_outliers)
242246
243247
plotObsQC(sfe,
@@ -283,7 +287,7 @@ plotQCmetrics(sfe,
283287
ggtitle("Local outliers for number of UMIs")
284288
```
285289

286-
Very few bins seem to be flagged as outliers!
290+
Very few cells seem to be flagged as outliers!
287291
:::
288292

289293

@@ -323,16 +327,16 @@ sfe$discard <- sfe$global_outliers | sfe$local_outliers
323327

324328
::: callout-important
325329
## Exercise 5
326-
Visualize the bins that will be discarded based on the combined criteria. How many bins will be removed?
330+
Visualize the cells that will be discarded based on the combined criteria. How many cells will be removed?
327331
:::
328332

329333
:::{.callout-tip collapse="true"}
330334
### Answer
331335
```{r}
332-
# check how many bins will be discarded
336+
# check how many cells will be discarded
333337
table(sfe$discard)
334338
335-
# have a final look at the bins that we will discard
339+
# have a final look at the cells that we will discard
336340
plotSpatialFeature(sfe, colGeometryName = "cellseg", features = "discard")
337341
```
338342
:::
@@ -388,10 +392,10 @@ plotQCmetrics(sfe,
388392
:::
389393

390394
## Filtering
391-
We decide to filter out the low-quality bins from the `SpatialExperiment` object based on the combined criteria of global thresholds and local outliers. Additionally, we will remove any features (genes) that have zero counts across all remaining bins.
395+
We decide to filter out the low-quality cells from the `SpatialExperiment` object based on the combined criteria of global thresholds and local outliers. Additionally, we will remove any features (genes) that have zero counts across all remaining cells.
392396

393397
```{r}
394-
# remove the bins considered of low quality based on both global and local metrics
398+
# remove the cells considered of low quality based on both global and local metrics
395399
sfe <- sfe[, !sfe$discard]
396400
397401
# remove features with all 0 counts
@@ -421,7 +425,7 @@ gc()
421425
**Key Takeaways:**
422426

423427
- `scuttle::addPerCellQCMetrics` provides essential per-bin QC metrics.
424-
- `SpotSweeper::localOutliers` helps identify individual problematic bins.
428+
- `SpotSweeper::localOutliers` helps identify individual problematic cells.
425429
- `SpotSweeper::findArtifacts` is powerful for detecting spatially coherent regions of poor quality.
426430
- Visualizing QC results is critical for data interpretation and downstream analysis decisions.
427431
:::

0 commit comments

Comments
 (0)