Skip to content

wanglab-broad/workshop25

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2025 Wang Lab Computational Workshop

Welcome to the 2025 Wang Lab Computational Workshop. This workshop focuses on the biological interpretation of spatial omics data, moving beyond initial data preprocessing to extract meaningful biological insights.

1. Introduction

The primary purpose of this workshop is to guide researchers through the biological interpretation steps of spatial omics analysis. We assume that data preprocessing is complete and participants are starting with:

  • Cell-by-gene matrices: Expression data for each cell.
  • XYZ coordinates: Spatial location data for each cell.
  • Reads assignment dataframes: Spatial location and gene identity for each transcript.

We will explore how to analyze these inputs to uncover tissue architecture and cell-cell interactions.

2. Basic Computational Environment Setup

2.1 Package Managers

Setting up a robust environment is crucial for reproducibility. We will discuss three common tools:

  • pip: The standard package installer for Python.
    • When to use: General purpose installation of Python packages from PyPI.
    • Pros: Universally available, simple.
    • Cons: Does not handle non-Python binary dependencies well.
  • miniforge: A community-driven minimal installer for Conda.
    • When to use: Creating isolated environments that require non-Python dependencies (e.g., libraries with C++ backends).
    • Pros: Avoids Anaconda licensing issues, robust dependency resolution.
    • Cons: Can be slower to resolve environments compared to newer tools.
  • uv: An extremely fast Python package installer and resolver.
    • When to use: Rapidly setting up environments and installing dependencies.
    • Pros: Significantly faster than pip and conda.
    • Cons: Newer ecosystem, may require specific workflows for complex binary dependencies.

Tip

Use miniforge or uv to manage distinct environments for each project to prevent dependency conflicts.

2.2 Standardized Platform

To ensure all participants have a consistent environment, we will use Google Colab. This eliminates local installation issues during the workshop. In this case, simple pip installation is good enough.

How to open notebooks:

  1. Navigate to the provided Google Drive folder and open one Colab notebook.
  2. Click "File" > "Save a copy in Drive" to save your own editable version.

3. Single-Cell Best Practices and Common Packages

3.1 scverse Ecosystem Overview

We follow the scverse philosophy, which promotes interoperability between life science tools. The ecosystem relies on shared data structures to ensure that tools for different tasks (e.g., RNA velocity, spatial stats) can work together seamlessly. For comprehensive guidance on analysis workflows, we recommend consulting sc-best-practices, a community-driven resource that provides detailed best practices for single-cell data analysis.

3.2 AnnData

AnnData (Annotated Data) is the core data structure for the scverse ecosystem.

  • What it is: A file format and in-memory object designed to handle large-scale matrix data with annotations.
  • Why it is used: It keeps data (matrix) and metadata (cell/gene info) aligned during filtering and analysis.
  • Data Model:
    • X: The main data matrix (cells x genes).
    • obs: Observations (cells) metadata (e.g., cluster labels, spatial coordinates).
    • var: Variables (genes) metadata.
    • obsm: Multi-dimensional observations annotation (e.g., PCA, UMAP, Spatial coordinates).
    • varm: Multi-dimensional variables annotation.
    • uns: Unstructured data (e.g., color palettes, analysis parameters).

3.3 Scanpy vs. Squidpy

  • Scanpy:
    • Function: Fundamental single-cell analysis toolkit.
    • Use Cases: Preprocessing, clustering, dimensionality reduction (UMAP/t-SNE), trajectory inference.
    • Best Practice: Use for all non-spatial analysis steps.
  • Squidpy:
    • Function: Spatial omics analysis toolkit built on Scanpy.
    • Use Cases: Spatial neighborhood enrichment, ligand-receptor interaction analysis, image feature extraction.
    • Differences: Squidpy leverages the spatial coordinates and image data that Scanpy typically ignores.

4. Imaging Analysis Tools

4.1 skimage

scikit-image (skimage) is the standard library for image processing in Python.

  • Tasks: Image filtering, segmentation, intensity normalization, and feature extraction relevant to spatial omics images.

4.2 Fiji and CellProfiler

  • Fiji / CellProfiler:
    • What they are: Powerful GUI-based tools for image analysis.
    • When to use: For complex image segmentation pipelines or when visual inspection and manual parameter tuning are required before automated processing.
    • Complementary Workflow: Use these tools to generate cell masks or ground-truth data, which can then be imported into Python/AnnData for downstream analysis.
    • Commonly used plugins in Fiji:
      • IJPB-Plugins (MorphoLibJ): A collection of plugins for mathematical morphology and image analysis, useful for filtering, segmentation, and geometrical measurements.
      • Stardist segmentation: A deep learning-based tool for object detection and segmentation (e.g., cell nuclei) in 2D and 3D images, effective for crowded environments.
      • Labkit: A user-friendly tool for pixel classification and segmentation using random forests, allowing users to train classifiers by drawing on the image.

5. Example Datasets

We will analyze two primary datasets to demonstrate different workflows.

5.1 STARmap PLUS — TauPS2APP Model

This dataset focuses on an Alzheimer's disease model.

  • Practice Script: Open In Colab

Workflow:

  1. Cell type classification: Identifying neuronal and glial subtypes.
  2. Detecting disease-associated cell types: Pinpointing populations specific to the disease state.
  3. Exploring spatial distribution: Mapping where these cells are located in the tissue.
  4. Plaque image segmentation: Identifying amyloid plaques from imaging channels.
  5. Intercellular spatial relationships: Analyzing how disease-associated cells cluster around plaques.
  6. Quantifying tau protein per cell: Measuring pathological protein load.
  7. Integration with human datasets: Comparing mouse model findings to human AD data.

5.2 STARmap + RIBOmap — Grin2a+/− Model

This dataset utilizes multi-modal data in a neurodevelopmental model.

  • Practice Script: Open In Colab

Workflow:

  1. Cross-modality integration: Combining STARmap (RNA) and RIBOmap (Ribosome-bound RNA) data.
  2. Cell type classification: Defining cell identities based on integrated data.
  3. SPIN for tissue-region identification: Spatial integration to define anatomical regions.
  4. Differential gene expression (DGE): Finding genes that change between genotypes.
  5. Gene Ontology enrichment analysis: Interpreting DGE results biologically.

6. Visualization Tools

Effective visualization is key to communicating biological findings. Here are some popular options:

  • Scanpy plotting API:
    • Use Case: High-level plotting functions specifically designed for single-cell and spatial omics data (e.g., sc.pl.umap(), sc.pl.spatial(), sc.pl.dotplot()). Integrates seamlessly with AnnData objects.
  • matplotlib and seaborn:
    • Use Case: Static publication-quality plots (scatter plots, violins, heatmaps).
  • plotly:
    • Use Case: Interactive plots allowing zooming and hovering, useful for data exploration.
  • napari:
    • Use Case: Multi-dimensional image viewer. Essential for overlaying gene expression points on top of raw tissue images to verify registration and segmentation.

About

2025 Wang Lab Computational Workshop

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published