You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Panel A**— UMAP coloured by unsupervised Leiden clusters. **Panel B**— Same embedding coloured by assigned cell type. **Panel C**— Cell type proportions. **Panel D**— Z-scored expression of canonical marker genes per cell type (red = high, blue = low). **Panel E**— Summary statistics.
27
+
**Panel A**- UMAP coloured by unsupervised Leiden clusters. **Panel B**- Same embedding coloured by assigned cell type. **Panel C**- Cell type proportions. **Panel D**- Z-scored expression of canonical marker genes per cell type (red = high, blue = low). **Panel E**- Summary statistics.
28
28
</details>
29
29
30
30
## Dataset
31
31
32
-
**10X Genomics PBMC 3k**— 2,700 peripheral blood mononuclear cells from a healthy donor, sequenced on the Chromium platform. This is the standard benchmark dataset used across [scanpy](https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html), [Seurat](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html), and other single-cell frameworks.
32
+
**10X Genomics PBMC 3k**- 2,700 peripheral blood mononuclear cells from a healthy donor, sequenced on the Chromium platform. This is the standard benchmark dataset used across [scanpy](https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html), [Seurat](https://satijalab.org/seurat/articles/pbmc3k_tutorial.html), and other single-cell frameworks.
-**Reference**: Zheng et al. (2017) [Massively parallel digital transcriptional profiling of single cells](https://doi.org/10.1038/ncomms14049). *Nature Communications* 8, 14049.
@@ -141,7 +141,7 @@ PAGA connects CD14+ monocytes → dendritic cells (the myeloid differentiation a
141
141
142
142
### Biological Interpretation
143
143
144
-
The dominance of CD4+ T cells (46%) is expected in healthy donor PBMCs. Dendritic cells are a rare population (1.4%), correctly resolved as a distinct cluster despite low cell count. The monocyte population is predominantly classical (CD14+); nonclassical (FCGR3A+) monocytes were not resolved as a separate cluster at resolution 0.5 — they likely merge with the classical monocyte cluster. This is consistent with the resolution-sensitivity of FCGR3A+ monocyte separation observed in the literature.
144
+
The dominance of CD4+ T cells (46%) is expected in healthy donor PBMCs. Dendritic cells are a rare population (1.4%), correctly resolved as a distinct cluster despite low cell count. The monocyte population is predominantly classical (CD14+); nonclassical (FCGR3A+) monocytes were not resolved as a separate cluster at resolution 0.5 - they likely merge with the classical monocyte cluster. This is consistent with the resolution-sensitivity of FCGR3A+ monocyte separation observed in the literature.
145
145
146
146
Silhouette scores in single-cell data are typically low due to continuous rather than discrete cell states; the metric is used here for relative comparison between resolutions, not as an absolute quality measure.
147
147
@@ -169,22 +169,22 @@ pytest -v
169
169
170
170
## Design Decisions
171
171
172
-
-**Doublet detection**— Scrublet integrated before QC filtering. 36 doublets detected (1.3%), 34 removed after other QC filters. Recommended by [Luecken & Theis (2019)](https://doi.org/10.15252/msb.20188746).
173
-
-**Automated annotation**— Clusters scored against curated PBMC marker gene sets rather than manual inspection. The marker sets are themselves a subjective choice — but encoding them explicitly makes the annotation reproducible and auditable.
174
-
-**Multi-resolution clustering**— Leiden at 5 resolutions with silhouette evaluation. The ≥5 cluster floor reflects the known minimum of major PBMC lineages (T cells, B cells, monocytes, NK, DCs).
175
-
-**Trajectory inference**— PAGA provides a principled graph abstraction of cell-type connectivity. Diffusion pseudotime rooted in CD14+ monocytes because they are the most primitive myeloid progenitor in PBMCs — the expected starting point of the monocyte-to-DC differentiation axis.
176
-
-**T cell subclustering**— Resolves CD4+/CD8+ populations that share CD3D/CD3E expression and cannot be separated at global clustering resolution.
-**Reproducible seeds**—`random_state=42` for UMAP, Leiden, Scrublet, and silhouette sampling.
179
-
-**Dual-format figures**— PNG (300 DPI) for web, PDF (vector) for publication submission.
172
+
-**Doublet detection**- Scrublet integrated before QC filtering. 36 doublets detected (1.3%), 34 removed after other QC filters. Recommended by [Luecken & Theis (2019)](https://doi.org/10.15252/msb.20188746).
173
+
-**Automated annotation**- Clusters scored against curated PBMC marker gene sets rather than manual inspection. The marker sets are themselves a subjective choice - but encoding them explicitly makes the annotation reproducible and auditable.
174
+
-**Multi-resolution clustering**- Leiden at 5 resolutions with silhouette evaluation. The ≥5 cluster floor reflects the known minimum of major PBMC lineages (T cells, B cells, monocytes, NK, DCs).
175
+
-**Trajectory inference**- PAGA provides a principled graph abstraction of cell-type connectivity. Diffusion pseudotime rooted in CD14+ monocytes because they are the most primitive myeloid progenitor in PBMCs - the expected starting point of the monocyte-to-DC differentiation axis.
176
+
-**T cell subclustering**- Resolves CD4+/CD8+ populations that share CD3D/CD3E expression and cannot be separated at global clustering resolution.
-**Reproducible seeds**-`random_state=42` for UMAP, Leiden, Scrublet, and silhouette sampling.
179
+
-**Dual-format figures**- PNG (300 DPI) for web, PDF (vector) for publication submission.
180
180
181
181
## Limitations
182
182
183
183
-**Single-sample dataset.** Multi-sample analyses would require batch correction (Harmony, scVI, or BBKNN).
184
184
-**`regress_out` is debatable.** Used here following the original scanpy tutorial, but Luecken & Theis (2019) suggest regression may overcorrect for well-filtered cells.
185
185
-**No pathway enrichment.** Gene set enrichment (via decoupler or GSEApy) would connect cell types to functional programmes. Planned as a future addition.
186
186
-**FCGR3A+ monocytes not resolved.** At resolution 0.5, nonclassical monocytes merge with the CD14+ cluster. Higher resolution or targeted subclustering would separate them.
187
-
-**Megakaryocytes not resolved.** The PBMC 3k dataset contains a small platelet/megakaryocyte population (PPBP+, PF4+) that merges with other clusters at this resolution. The canonical scanpy tutorial resolves 8 cell types from this dataset; our pipeline resolves 5 at the global level + 2 via T cell subclustering. The difference is resolution choice — we optimise for silhouette score rather than maximising cluster count.
187
+
-**Megakaryocytes not resolved.** The PBMC 3k dataset contains a small platelet/megakaryocyte population (PPBP+, PF4+) that merges with other clusters at this resolution. The canonical scanpy tutorial resolves 8 cell types from this dataset; our pipeline resolves 5 at the global level + 2 via T cell subclustering. The difference is resolution choice - we optimise for silhouette score rather than maximising cluster count.
0 commit comments