This is a brief abstract of my exemplar, which includes a representative image.
This exemplar was developed at Imperial College London by Sara Patti in collaboration with Adrian D'Alessandro from Research Software Engineering and Jesus Urtasun from Research Computing & Data Science at the Early Career Researcher Institute.
After completing this exemplar, students will:
- Analyze spatial transcriptomic data (Xenium)
- Develop a reporducible pipeline
- Implement RSE best practices (e.g testing, continuous integration)
- Biologists interested in developing bioinformatic pipelines
- RSE interested in analyzing spatial transcriptomics data
- Required skills/knowledge (e.g. programming languages, libraries, theory, courses)
- System requirements (e.g. Python 3.11+, Anaconda, 50 GB disk space, etc.)
- Hardware or HPC requirements (if any)
e.g. Step-by-step guide:
- Start by (instruction).
- Visit the sections of this notebook in some particular order.
- Attempt exercises
1a
,1b
, etc. - Progress to advanced materials in the Github repository linked here.
- Compare with solutions available in the
solutions
folder.
Briefly describe how this project fits in your discipline, why you chose to work on it, and what other disciplines may find it useful.
Python, squidpy, MuSpAn
Overview of code organisation and structure.
.
├── notebooks
│ ├── ex1.ipynb
├── src
│ ├── file1.py
│ ├── file2.cpp
│ ├── ...
│ └── data
├── docs
└── test
Code is organised into logical components:
notebooks
for tutorials and exercisessrc
for core code, potentially divided into further modulesdata
withinsrc
for datasetsdocs
for documentationtest
for testing scripts
Goal: Ensure clean, usable spatial gene expression data.
- Run xenium output through Space Ranger or Xenium tools
- Filter low-quality spots/cells
- Normalize gene counts
Goal: Identify patterns and groups of similar gene expression profiles.
- PCA + UMAP/t-SNE
- Cluster by gene expression
- Identify cell types with marker genes
Goal: Map gene expression and clusters back to their spatial context.
- Overlay expression and clusters on tissue image
- Plot spatially enriched genes
- Map cell types or states in space
Goal: Discover meaningful biology.
- Spatially variable genes (SVGs)
- DE between regions or conditions
- Pathway or GO enrichment
List datasets used with:
- Licensing info
- Where they are included (in the repo or external links)
- Code testing and/or test examples
- Use of continuous integration (if any)
- Any other software development best practices
Task | Time |
---|---|
Reading | 3 hours |
Practising | 3 hours |
- Relevant sources, websites, images, AOB.
This project is licensed under the BSD-3-Clause license.