Reproducible Spatial Transcriptomics Pipeline with RSE Best Practices

This is a brief abstract of my exemplar, which includes a representative image.

This exemplar was developed at Imperial College London by Sara Patti in collaboration with Adrian D'Alessandro from Research Software Engineering and Jesus Urtasun from Research Computing & Data Science at the Early Career Researcher Institute.

Learning Outcomes 🎓

After completing this exemplar, students will:

Analyze spatial transcriptomic data (Xenium)
Develop a reporducible pipeline
Implement RSE best practices (e.g testing, continuous integration)

Target Audience 🎯

Biologists interested in developing bioinformatic pipelines
RSE interested in analyzing spatial transcriptomics data

Prerequisites ✅

Academic 📚

Required skills/knowledge (e.g. programming languages, libraries, theory, courses)

System 💻

System requirements (e.g. Python 3.11+, Anaconda, 50 GB disk space, etc.)
Hardware or HPC requirements (if any)

Getting Started 🚀

e.g. Step-by-step guide:

Start by (instruction).
Visit the sections of this notebook in some particular order.
Attempt exercises 1a, 1b, etc.
Progress to advanced materials in the Github repository linked here.
Compare with solutions available in the solutions folder.

Briefly describe how this project fits in your discipline, why you chose to work on it, and what other disciplines may find it useful.

Software Tools 🛠️

Python, squidpy, MuSpAn

Project Structure 🗂️

Overview of code organisation and structure.

.
├── notebooks
│ ├── ex1.ipynb
├── src
│ ├── file1.py
│ ├── file2.cpp
│ ├── ...
│ └── data
├── docs
└── test

Code is organised into logical components:

notebooks for tutorials and exercises
src for core code, potentially divided into further modules
data within src for datasets
docs for documentation
test for testing scripts

Roadmap 🗺️

Preprocessing & Quality Control

Goal: Ensure clean, usable spatial gene expression data.

Run xenium output through Space Ranger or Xenium tools
Filter low-quality spots/cells
Normalize gene counts

Dimensionality Reduction & Clustering

Goal: Identify patterns and groups of similar gene expression profiles.

PCA + UMAP/t-SNE
Cluster by gene expression
Identify cell types with marker genes

Spatial Mapping & Visualization

Goal: Map gene expression and clusters back to their spatial context.

Overlay expression and clusters on tissue image
Plot spatially enriched genes
Map cell types or states in space

Differential Expression & Functional Analysis

Goal: Discover meaningful biology.

Spatially variable genes (SVGs)
DE between regions or conditions
Pathway or GO enrichment

Data 📊

List datasets used with:

Licensing info
Where they are included (in the repo or external links)

Best Practice Notes 📝

Code testing and/or test examples
Use of continuous integration (if any)
Any other software development best practices

Estimated Time ⏳

Task	Time
Reading	3 hours
Practising	3 hours

Additional Resources 🔗

Relevant sources, websites, images, AOB.

Licence 📄

This project is licensed under the BSD-3-Clause license.

Name		Name	Last commit message	Last commit date
Latest commit History 76 Commits
.github/workflows		.github/workflows
docs		docs
notebooks		notebooks
src/recode_st		src/recode_st
tests		tests
utils		utils
.codespell_ignore.txt		.codespell_ignore.txt
.gitignore		.gitignore
.lycheeignore		.lycheeignore
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reproducible Spatial Transcriptomics Pipeline with RSE Best Practices

Learning Outcomes 🎓

Target Audience 🎯

Prerequisites ✅

Academic 📚

System 💻

Getting Started 🚀

Software Tools 🛠️

Project Structure 🗂️

Roadmap 🗺️

Preprocessing & Quality Control

Dimensionality Reduction & Clustering

Spatial Mapping & Visualization

Differential Expression & Functional Analysis

Data 📊

Best Practice Notes 📝

Estimated Time ⏳

Additional Resources 🔗

Licence 📄

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

ImperialCollegeLondon/ReCoDe-spatial-transcriptomics

Folders and files

Latest commit

History

Repository files navigation

Reproducible Spatial Transcriptomics Pipeline with RSE Best Practices

Learning Outcomes 🎓

Target Audience 🎯

Prerequisites ✅

Academic 📚

System 💻

Getting Started 🚀

Software Tools 🛠️

Project Structure 🗂️

Roadmap 🗺️

Preprocessing & Quality Control

Dimensionality Reduction & Clustering

Spatial Mapping & Visualization

Differential Expression & Functional Analysis

Data 📊

Best Practice Notes 📝

Estimated Time ⏳

Additional Resources 🔗

Licence 📄

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages