Skip to content

ImperialCollegeLondon/ReCoDe-spatial-transcriptomics

Repository files navigation

Reproducible Spatial Transcriptomics Pipeline with RSE Best Practices

This is a brief abstract of my exemplar, which includes a representative image. Scikit Camera Image

This exemplar was developed at Imperial College London by Sara Patti in collaboration with Adrian D'Alessandro from Research Software Engineering and Jesus Urtasun from Research Computing & Data Science at the Early Career Researcher Institute.

Learning Outcomes 🎓

After completing this exemplar, students will:

  • Analyze spatial transcriptomic data (Xenium)
  • Develop a reporducible pipeline
  • Implement RSE best practices (e.g testing, continuous integration)

Target Audience 🎯

  1. Biologists interested in developing bioinformatic pipelines
  2. RSE interested in analyzing spatial transcriptomics data

Prerequisites ✅

Academic 📚

  • Required skills/knowledge (e.g. programming languages, libraries, theory, courses)

System 💻

  • System requirements (e.g. Python 3.11+, Anaconda, 50 GB disk space, etc.)
  • Hardware or HPC requirements (if any)

Getting Started 🚀

e.g. Step-by-step guide:

  1. Start by (instruction).
  2. Visit the sections of this notebook in some particular order.
  3. Attempt exercises 1a, 1b, etc.
  4. Progress to advanced materials in the Github repository linked here.
  5. Compare with solutions available in the solutions folder.

Briefly describe how this project fits in your discipline, why you chose to work on it, and what other disciplines may find it useful.

Software Tools 🛠️

Python, squidpy, MuSpAn

Project Structure 🗂️

Overview of code organisation and structure.

.
├── notebooks
│ ├── ex1.ipynb
├── src
│ ├── file1.py
│ ├── file2.cpp
│ ├── ...
│ └── data
├── docs
└── test

Code is organised into logical components:

  • notebooks for tutorials and exercises
  • src for core code, potentially divided into further modules
  • data within src for datasets
  • docs for documentation
  • test for testing scripts

Roadmap 🗺️

Preprocessing & Quality Control

Goal: Ensure clean, usable spatial gene expression data.

  • Run xenium output through Space Ranger or Xenium tools
  • Filter low-quality spots/cells
  • Normalize gene counts

Dimensionality Reduction & Clustering

Goal: Identify patterns and groups of similar gene expression profiles.

  • PCA + UMAP/t-SNE
  • Cluster by gene expression
  • Identify cell types with marker genes

Spatial Mapping & Visualization

Goal: Map gene expression and clusters back to their spatial context.

  • Overlay expression and clusters on tissue image
  • Plot spatially enriched genes
  • Map cell types or states in space

Differential Expression & Functional Analysis

Goal: Discover meaningful biology.

  • Spatially variable genes (SVGs)
  • DE between regions or conditions
  • Pathway or GO enrichment

Data 📊

List datasets used with:

  • Licensing info
  • Where they are included (in the repo or external links)

Best Practice Notes 📝

  • Code testing and/or test examples
  • Use of continuous integration (if any)
  • Any other software development best practices

Estimated Time ⏳

Task Time
Reading 3 hours
Practising 3 hours

Additional Resources 🔗

  • Relevant sources, websites, images, AOB.

Licence 📄

This project is licensed under the BSD-3-Clause license.

About

Analysis pipeline for spatial transcriptomics

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages