You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Reference**: Zheng et al. (2017) [Massively parallel digital transcriptional profiling of single cells](https://doi.org/10.1038/ncomms14049). *Nature Communications* 8, 14049.
05 Annotate ────── Wilcoxon DE → score against PBMC marker signatures
47
+
│
48
+
▼
49
+
06 Figures ─────── Multi-panel publication figure + 3D UMAP
50
+
```
51
+
28
52
## Pipeline
29
53
30
54
| Step | Script | What it does |
@@ -49,7 +73,9 @@ All scripts are in `scripts/`. Each reads the previous step's `.h5ad` output fro
49
73
| FCGR3A+ Monocytes | 180 | 6.8 | FCGR3A, MS4A7 |
50
74
| Dendritic cells | 38 | 1.4 | FCER1A, CST3 |
51
75
52
-
These proportions are consistent with expected PBMC composition from a healthy donor. Clustering selected resolution 0.5 (6 clusters, silhouette 0.196).
76
+
The dominance of CD4+ T cells (45%) is expected in healthy donor PBMCs. The ratio of classical (CD14+) to nonclassical (FCGR3A+) monocytes is approximately 2.6:1, consistent with published literature. Dendritic cells are a rare population (1.4%), correctly resolved as a distinct cluster. CD8+ T cells and megakaryocytes are present in the dataset but were not resolved as separate clusters at resolution 0.5 — they likely merge with the CD4+ T cell and monocyte clusters respectively due to shared marker expression (CD3D/CD3E for T cell subtypes).
77
+
78
+
Clustering selected resolution 0.5 (6 clusters, silhouette 0.196). Silhouette scores in single-cell data are typically low due to continuous rather than discrete cell states; the metric is used here for relative comparison between resolutions, not as an absolute quality measure.
-**Modular scripts** — Each step is independent. Re-run any step without repeating upstream work.
79
105
106
+
## Limitations and Future Work
107
+
108
+
-**No doublet detection.** Scrublet or similar should precede QC in a production pipeline. Omitted here because PBMC 3k is a clean benchmark with negligible doublet rates.
109
+
-**No batch correction.** Single-sample dataset. Multi-sample analyses would require Harmony, scVI, or BBKNN.
110
+
-**`regress_out` is debatable.** Used here following the original scanpy tutorial, but Luecken & Theis (2019) suggest regression may overcorrect for well-filtered cells. Included for pedagogical alignment with the standard workflow.
111
+
-**CD8+ T cells not resolved.** Would require higher clustering resolution or subclustering of the T cell compartment.
0 commit comments