-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathgran.cluster.deep.dive.Rmd
356 lines (234 loc) · 11.4 KB
/
gran.cluster.deep.dive.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
---
title: "Dive into Potential Gran. Clusters"
author: "D. Ford Hannum"
date: "6/5/2020"
output:
html_document:
toc: true
toc_depth: 3
number_sections: false
theme: united
highlight: tango
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(Seurat)
library(ggplot2)
library(MAST)
```
```{r loading data and giving labels}
wbm <- readRDS('./data/wbm_clustered_filtered_named.rds')
#DimPlot(wbm, reduction = 'umap')
new_cluster_ids <- c(0,1,2,'B-cell','MK',5,6,'Monocyte','Macrophage',
'Erythrocyte','B-cell Prog.','T-cell/NK', 'MEP')
names(new_cluster_ids) <- levels(wbm)
#new_cluster_ids
wbm <- RenameIdents(wbm, new_cluster_ids)
```
# Introduction
After doing a thorough labeling through SingleR and marker gene expression we are relatively confident of most of our cluster labels. There is still some uncertainty around clusters 0, 1, 2, 5 and 6; which were all labeled as granulocyte clusters but we had not good marker genes that showed widespread expression through the clusters. We came up with a few potential reasons for this:
1. *We did not have good marker genes for granulocytes.*
2. *These clusters are majority mutant cells, so there transcriptome could be different from canonical granulocytes, and maybe we should focus on control expression in these clusters.*
3. *Granulocytes are an umbrella term for multiple cell types: eosinophils, basophils, neutrophils and mast cells. Perhaps we would do better using marker genes for these subtypes to label the clusters.*
Another possible way to distinguish these clusters for naming:
* *Looking at the marker genes that distinguish an individual cluster vs all other clusters, and looking at marker genes that distinguish this cluster from the other potential "granulocyte"" clusters (ie 0 vs [1,2,5,6])*
My first step is to distinguish the genes that define these different clusters, to see if any are associated with a particular cell type.
Below is the UMAP projection with the cluster labels we have high confidence in.
```{r umap with starting labels}
DimPlot(wbm, reduction = 'umap', label = T, repel = T) + NoLegend()
```
# Cell Cluster Markers
Getting genes that distinguish given cluster from all other clusters, and other genes that distinguish this cluster from other potential "granulocyte" (PG) clusters.
Doing this same analysis with only using control cell (excluding Mpl cells). So for each cluster we will have four readouts:
1. Cluster **X** vs **all other** clusters, **all** cells
2. Cluster **X** vs **PG** clusters, **all** cells
3. Cluster **X** vs **all other** clusters, **control** cells
4. Cluster **X** vs **PG** clusters, **control** cells
Results:
* *avg_logFC* : log fold-change of the average expression between the two groups. Positive values indicate that the gene is more highly expressed in the first group
* *pct.1* : the percentage of cells where the gene is detected in the first group
* *pct.2* : the percentage of cells where the gene is detected in the second group
* *p_val_adj* : adjusted p-value, based on bonferroni correctioni using all genes in the dataset
```{r getting all the DE without focusing on controls, message = F}
PGC <- c(0,1,2,5,6)
list_vs_all <- list()
list_vs_PGC <- list()
cnt <- 1
for (i in PGC){
#print(i)
not_i <- PGC[!(PGC == i)]
MvA <- FindMarkers(wbm, ident.1 = i,
min.pct = 0.5, logfc.threshold = log(2))
MvA <- MvA[MvA$p_val_adj < 0.05,]
MvA <- MvA[order(MvA$avg_logFC, decreasing = T),]
list_vs_all[[cnt]] <- MvA
MvPGC <- FindMarkers(wbm, ident.1 = i, ident.2 = not_i,
min.pct = 0.5, logfc.threshold = log(2))
MvPGC <- MvPGC[MvPGC$p_val_adj < 0.05,]
MvPGC <- MvPGC[order(MvPGC$avg_logFC, decreasing = T),]
list_vs_PGC[[cnt]] <- MvPGC
cnt <- cnt + 1
}
# write.table(list_vs_PGC[[5]], './data/test.table.txt', quote = F, row.names = F,
# sep = '\t')
```
```{r setting it up to find the differential expression}
PGC <- c(0,1,2,5,6)
cntrl_cells <- rownames([email protected][[email protected]$condition == 'control',])
cnt_list_vs_all <- list()
cnt_list_vs_PGC <- list()
cnt <- 1
cwbm <- subset(wbm, cells = cntrl_cells)
for (i in PGC){
#print(i)
not_i <- PGC[!(PGC == i)]
cells1 <-
MvA <- FindMarkers(cwbm, ident.1 = i,
min.pct = 0.5, logfc.threshold = log(2))
MvA <- MvA[MvA$p_val_adj < 0.05,]
MvA <- MvA[order(MvA$avg_logFC, decreasing = T),]
cnt_list_vs_all[[cnt]] <- MvA
MvPGC <- FindMarkers(cwbm, ident.1 = i, ident.2 = not_i,
min.pct = 0.5, logfc.threshold = log(2))
MvPGC <- MvPGC[MvPGC$p_val_adj < 0.05,]
MvPGC <- MvPGC[order(MvPGC$avg_logFC, decreasing = T),]
cnt_list_vs_PGC[[cnt]] <- MvPGC
cnt <- cnt + 1
}
```
## Cluster 0
Focusing on the top 10 genes that are described in cluster 0 vs all other clusters:
```{r cluster 0}
head(list_vs_all[[1]],10)
#tail(list_vs_all[[1]],10)
```
Focusing on the top 10 genes from this cluster vs PG clusters
```{r clst 0 vs pgc}
head(list_vs_PGC[[1]],10)
```
Focusing on the top 10 genes from this cluster vs all clusters, control cells only
```{r cntrl clst 0 vs all}
head(cnt_list_vs_all[[1]],10)
```
```{r cntrl clst 0 vs pgc}
head(cnt_list_vs_PGC[[1]],10)
```
Genes of Interest:
* *Ltf* : it's protein product is found in the secondary granules of neutrophils
* *Ngp* : neutrophilic granule protein
* *Lcn2 *: neutrophil gelatinase-associated liopcalin
### **Conclusion**
This makes me believe that this is a **NEUROPHIL CLUSTER**
## Cluster 1
Cluster 1 vs all other clusters
```{r 1 vs all}
head(list_vs_all[[2]],10)
```
Cluster 1 vs other PB clusters
```{r 1 vs pbcs}
head(list_vs_PGC[[2]],10)
```
Cluster 1 vs all others, control only
```{r cntrl clst 1 vs all}
head(cnt_list_vs_all[[2]],10)
```
Cluster 1 vs other PG clusters, control only
```{r cntrl clst 1 vs pgc}
head(cnt_list_vs_PGC[[2]],10)
```
Genes of Interest:
* *Csf3r* : one of our granulocyte markers. It controls the production, differentiation, and function of granulocytes.
* *Il1b* : this cytokine is produced by activated macrophages as a proprotein.
* *Sell* : the gene product is required for binding and subsequent rolling of leucocytes on endothelial cells
* *CCL6* : expressed in cells from neutrophil and macrophage lineages, and can be greatly induced under conditions suitable for myeloid cell differentiation. Highly expressed in bone marrow cultures that have been stimulated with cytokin GM-CSF
* *Srgn* : encodes a protein best known as a hematopoietic cell granule proteoglycan. Plays a role in formation of mast cell secretory granules. Plays a role in neutrophil elastase in azurophil granues of neutrophils.
* *Wfdc1* : this gene is downregulated in many cancer types and may be involved in the inhibition of cell proliferation.
### **Conclusion**
Some very interesting genes (Ccl6, Srgn) that are related to neutrophil and mast cell lineages and it also has increased expression of Csf3r which we used as a granulocyte marker. I feel confident this is a granulocyte, but not sure on a sub-classification.
## Cluster 2
Cluster 2 vs all others
```{r cluster 2 vs all others}
head(list_vs_all[[3]],10)
```
Cluster 2 vs other PG clusters
```{r cluster 2 vs other Pg clusters}
head(list_vs_PGC[[3]],10)
```
Cluster 2 vs all others, control only
```{r cntrl clst 2 vs all}
head(cnt_list_vs_all[[3]],10)
```
Cluster 3 vs other PG clusters, control only
```{r cntrl clst 3 vs pgc}
head(cnt_list_vs_PGC[[3]],10)
```
Genes of Interest:
* *Mmps* : not sure how they relate but since there are two I added them. They are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodling, as well as in disease processes, such as arthritis and metastasis.
* *Cd177* : a GPI-linked cell surface glycoprotein that plays a role in neutrophil activation and funtions in neutrophil transmigration. Mutations in this gene are associated with myeloproliferative diseases. Over-expression is found in patients with PV.
### **Conclusions**
A very interesting gene in Cd177, but it could potentially be another neutrophil cluster, but nothing is certain.
## Cluster 5
Cluster 5 vs all
```{r cluster 5 vs all}
head(list_vs_all[[4]],20)
```
Cluster 5 vs other PG clusters
```{r 5 vs other pgcs}
head(list_vs_PGC[[4]],10)
```
Cluster 5 vs all others, control only
```{r cntrl clst 5 vs all}
head(cnt_list_vs_all[[4]],10)
```
Cluster 5 vs other PG clusters, control only
```{r cntrl clst 5 vs pgc}
head(cnt_list_vs_PGC[[4]],10)
```
Genes of Interest:
* *Chil3* : has chemotactic activity for T-lymphocytes, bone marrow cells and eosinophils.
* *Mki67* : one of our proliferation markers.
* *Hmgn2* : the protein is associated with transcriptionally active chromatin.
* *Birc5* : encodes a negative regulatory protein that prevents apoptotic cell death.
* *Hmgb2* : the proteins of this family are chromatin-associated and demonstrates the ability to efficiently bend DNA and form DNA circles
* *Fcnb* : immatures mouse granulocytic myeloid cells are characterized by production of Fcnb
* *Many Histone Genes* : there are many up-regulated genes associated with basic histone markers
* *Lcn2* : the protein encoded by this gene is a neutrophil gelatinase-associated lipocalin and plays a role in innate immunity by limiting...
### **Conclusion**
Some very interesting genes popped up in both that were related to histone modifications, chromatin accessibility, proliferation, etc. Not sure what to make of these results
## Cluster 6
Cluster 6 vs all
```{r cluster 6 vs all}
head(list_vs_all[[5]],10)
```
Cluster 6 vs other PG clusters
```{r cluster 6 vs pg clusters}
head(list_vs_PGC[[5]],10)
```
Cluster 6 vs all others, control only
```{r cntrl clst 6 vs all}
head(cnt_list_vs_all[[5]],10)
```
Cluster 6 vs other PG clusters, control only
```{r cntrl clst 6 vs pgc}
head(cnt_list_vs_PGC[[5]],10)
```
Genes of Interest:
* *Elane*: one of our granulocyte markers
* *Prtn3*: a paralog of Elane
* *Mpo*: a heme prtoein synthesized during myeloid differentiation that constitutes the major component of neutrophil azurophilic granules.
* *Ms4a3*: is specifically and transiently expressed by GMPs in the bone marrow. Ms4a3-based models specifically and efficiently fate map monocytes and granulocytes
* *Ctsg*: one of three serine proteases of the chymotrypsin family that are stored in the azurophil granules.
* *Fcnb*: immatures mouse granulocytic myeloid cells are characterized by production of Fcnb, previously found in cluster 5
* *Nkg7*: natural killer cell granule protein
* *Serpinb1a*: this protein inhibits the neutrophil-derived proteinases neutrophil elastase, etc.
* *Ribosomal RNAs*: many ribosomal RNAs were present in these lists.
### **Conclusion**
Many very intersting genes. Once again seems like a neutrophil cluster.
# Subsetting to just control cells distribution and cell counts
```{r testing just on control cells}
DimPlot(wbm, reduction = 'umap', cells = cntrl_cells)
x <- as.data.frame(summary(as.factor([email protected]$seurat_clusters))[c(1,2,3,6,7)])
x$Cluster <- rownames(x)
colnames(x)[1] <- 'Cell Count'
x[c(2,1)]
```