-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
302 lines (250 loc) · 10.1 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
echo = TRUE,
eval = FALSE
)
```
# PathPinpointR
PathPinpointR identifies the position of a sample upon a trajectory.
##### *Assumptions:*
- Sample is found upon the chosen trajectory.
- Sample is from a distinct part of the trajectory. A sample with cells that are evenly distributed across the trajectory will have a predicted location at the centre of the trajectory.
# Example Workflow
This vignette will take you through the basics running PPR.
The data used here is an integrated data-set of blastocyst data.
## Installation
#### Install required packages
Run the following code to load all packages neccecary for PPR & this vignette.
```{r}
required_packages <- c("SingleCellExperiment", "Biobase", "fastglm", "ggplot2",
"monocle", "plyr", "RColorBrewer", "ggrepel", "ggridges",
"gridExtra", "devtools", "mixtools", "Seurat",
"parallel", "RColorBrewer")
## for packages "fastglm", "ggplot2", "plyr", "RColorBrewer",
# "ggrepel", "ggridges", "gridExtra", "mixtools"
new_packages <- required_packages[!(required_packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)
## for packages "SingleCellExperiment", "Biobase", "slingshot".
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
new_packages <- required_packages[!(required_packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) BiocManager::install(new_packages)
# for package "GeneSwitches"
devtools::install_github("SGDDNB/GeneSwitches")
```
#### install PathPinpointR
You can install the development version of PathPinpointR using:
```{r}
devtools::install_github("moi-taiga/PathPinpointR")
```
### Load the required packages
```{r}
library(PathPinpointR)
library(Seurat)
library(ggplot2)
library(SingleCellExperiment)
library(slingshot)
library(RColorBrewer)
library(GeneSwitches)
```
## Load the reference data
The reference dataset is a Seurat object of a blastocyst dataset.
The data is an integrated dataset of 8 studies.
The data has been downsampled and filtered to only inlcude ebiblast cells.
```{r}
# download the example data
get_example_data()
# Load the reference data to the environment
reference_seu <- readRDS("./reference.rds")
# Load the sample data to the environment
samples_seu <- lapply(c("./LW120.rds", "./LW122.rds"), readRDS)
# name the samples in the list
names(samples_seu) <- c("LW120", "LW122")
```
#### View the reference UMAP plot
Plot the reference data, colored by the day of development.
```{r}
DimPlot(object = reference_seu,
reduction = "umap",
group.by = "time",
label = TRUE) +
ggtitle("Reference")
```
```{r, eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/README-reference-dim-plot.png")
```
The plot shows the development of the epiblast cells.
Some labels are a mix of samples from different days.
## Convert to SingleCellExperiment objects.
Prior to running slingshot and GeneSwitches,
we need to convert the Seurat objects to SingleCellExperiment objects.
```{r}
reference_sce <- SingleCellExperiment(assays = list(expdata = reference_seu@assays$RNA$counts))
colData(reference_sce) <- DataFrame([email protected])
reducedDims(reference_sce)$UMAP <- reference_seu@[email protected]
# create an empty list to to store the sce sample objects
samples_sce <- list()
# Iterate through each Seurat object in the samples list
for (i in seq_along(samples_seu)){
# convert each sample to a SingleCellExperiment object & store in the list
samples_sce[[i]] <- SingleCellExperiment(assays = list(expdata = samples_seu[[i]]@assays$RNA$counts))
}
# carry over the sample names from the Seurat objects.
names(samples_sce) <- names(samples_seu)
```
## Run slingshot
Run slingshot on the reference data to produce pseudotime for each cell.
```{r}
reference_sce <- slingshot(reference_sce,
clusterLabels = "seurat_clusters",
start.clus = "2",
end.clus = "1",
reducedDim = "UMAP")
#Rename the Pseudotime column to work with GeneSwitches
colData(reference_sce)$Pseudotime <- reference_sce$slingPseudotime_1
```
#### Plot the slingshot trajectory.
The plot shows the trajectory of the blastocyst data,
with cells colored by pseudotime.
```{r}
# Generate colors
colors <- colorRampPalette(brewer.pal(11, "Spectral")[-6])(100)
plotcol <- colors[cut(reference_sce$slingPseudotime_1, breaks = 100)]
# Plot the data
plot(reducedDims(reference_sce)$UMAP, col = plotcol, pch = 16, asp = 1)
lines(SlingshotDataSet(reference_sce), lwd = 2, col = "black")
```
```{r, eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/README-slingshot.png")
```
## Binarize the Gene Expression Data
Using the package [GeneSwitches](https://github.com/SGDDNB/GeneSwitches),
binarize the gene expression data of the reference and query data-sets,
with a cutoff of 1.
```{r, eval=FALSE}
# binarize the expression data of the reference
reference_sce <- binarize_exp(reference_sce,
fix_cutoff = TRUE,
binarize_cutoff = 1,
ncores = 1)
# Find the switching point of each gene in the reference data
reference_sce <- find_switch_logistic_fastglm(reference_sce,
downsample = TRUE,
show_warning = FALSE)
```
Note: both binatize_exp() and find_switch_logistic_fastglm(),
are time consuming processes and may take tens of minutes to run.
## Visualise the switching genes
generate a list of switching genes, and visualise them on a pseudo timeline.
```{r}
switching_genes <- filter_switchgenes(reference_sce,
allgenes = TRUE,
r2cutoff = 0)
# Plot the timeline using plot_timeline_ggplot
plot_timeline_ggplot(switching_genes,
timedata = colData(reference_sce)$Pseudotime,
txtsize = 3)
```
```{r, eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/README-timeline-plot1.png")
```
## Select a number of switching genes
Using the PPR function precision():
```{r}
precision(reference_sce)
```
```{r, eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/README-precision1_plot.png")
```
Narrow down the search to find the optimum number of switching genes.
```{r}
precision(reference_sce, n_sg_range = seq(50, 150, 1))
```
```{r, eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/README-precision2_plot.png")
```
## Filter and re-visualise the switching genes.
```{r}
switching_genes <- filter_switchgenes(reference_sce,
allgenes = TRUE,
r2cutoff = 0,
topnum = 114)
```
Note: The number of switching genes significantly affects the accuracy of PPR.\
too many will reduce the accuracy by including uninformative genes/noise. \
too few will reduce the accuracy by excluding informative genes. \
The using precision(), 114 is found to be the optimum for this data. \
```{r}
# Plot the timeline using plot_timeline_ggplot
plot_timeline_ggplot(switching_genes,
timedata = colData(reference_sce)$Pseudotime,
txtsize = 3)
```
```{r, eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/README-timeline-plot2.png")
```
## Measure accuracy
We can calculate the accuracy of PPR in the given trajectory by comparing the
predicted position of the reference cells to their pseudotimes defined by slingshot.
the accuracy can be visualised using `acuracy_test`.
```{r}
# reduce the reference data to only include the switching genes.
reference_reduced_sce <- reduce_counts_matrix(reference_sce, switching_genes)
# predict the position of cells in the reference trajectory.
reference_ppr <- predict_position(reference_reduced_sce, switching_genes)
# plot the accuracy of the prediction
accuracy_test(reference_ppr, reference_sce, plot = TRUE)
```
```{r, eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/README-accuracy_plot.png")
```
## Binarize the sample data
Binarize the gene expression data of the samples.
```{r}
# First reduce the sample data to only include the switching genes.
samples_sce <- lapply(samples_sce, reduce_counts_matrix, switching_genes)
# binarize the expression data of the samples
samples_binarized <- lapply(samples_sce,
binarize_exp,
fix_cutoff = TRUE,
binarize_cutoff = 1,
ncores = 1)
```
## Predict Position
Produce an estimate for the position of each cell in each sample.
The prediction is stored as a PPR_OBJECT.
```{r}
# Iterate through each Seurat object in the predicting their positons,
# on the reference trajectory, using PathPinpointR.
samples_ppr <- lapply(samples_binarized, predict_position, switching_genes)
```
## Plotting the predicted position of each sample:
plot the predicted position of each sample on the reference trajectory.
```{r}
# show the predicted position of the first sample
# include the position of cells in the reference data, by a given label.
ppr_plot() +
reference_idents(reference_sce, "time") +
sample_prediction(samples_ppr[[1]], label = "Sample 1", col = "red")
# show the predicted position of the second sample
sample_prediction(samples_ppr[[2]], label = "Sample 2", col = "blue")
# show the points at which selected genes switch.
switching_times(c("TKTL1", "CYYR1", "KHDC1L"), switching_genes)
```
```{r eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/README-sample1.png")
```
A simpler plot of your results can be generated with the function ppr_vioplot().
```{r}
ppr_vioplot(samples_ppr, reference_sce, ident = "time")
```
```{r eval = TRUE, echo = FALSE}
knitr::include_graphics("./man/figures/README-PPR-vioplot.png")
```