Welcome to the 2025 Wang Lab Computational Workshop. This workshop focuses on the biological interpretation of spatial omics data, moving beyond initial data preprocessing to extract meaningful biological insights.
The primary purpose of this workshop is to guide researchers through the biological interpretation steps of spatial omics analysis. We assume that data preprocessing is complete and participants are starting with:
Cell-by-gene matrices: Expression data for each cell.XYZ coordinates: Spatial location data for each cell.Reads assignment dataframes: Spatial location and gene identity for each transcript.
We will explore how to analyze these inputs to uncover tissue architecture and cell-cell interactions.
Setting up a robust environment is crucial for reproducibility. We will discuss three common tools:
- pip: The standard package installer for Python.
- When to use: General purpose installation of Python packages from PyPI.
- Pros: Universally available, simple.
- Cons: Does not handle non-Python binary dependencies well.
- miniforge: A community-driven minimal installer for Conda.
- When to use: Creating isolated environments that require non-Python dependencies (e.g., libraries with C++ backends).
- Pros: Avoids Anaconda licensing issues, robust dependency resolution.
- Cons: Can be slower to resolve environments compared to newer tools.
- uv: An extremely fast Python package installer and resolver.
- When to use: Rapidly setting up environments and installing dependencies.
- Pros: Significantly faster than pip and conda.
- Cons: Newer ecosystem, may require specific workflows for complex binary dependencies.
Tip
Use miniforge or uv to manage distinct environments for each project to prevent dependency conflicts.
To ensure all participants have a consistent environment, we will use Google Colab. This eliminates local installation issues during the workshop. In this case, simple pip installation is good enough.
How to open notebooks:
- Navigate to the provided Google Drive folder and open one Colab notebook.
- Click "File" > "Save a copy in Drive" to save your own editable version.
We follow the scverse philosophy, which promotes interoperability between life science tools. The ecosystem relies on shared data structures to ensure that tools for different tasks (e.g., RNA velocity, spatial stats) can work together seamlessly. For comprehensive guidance on analysis workflows, we recommend consulting sc-best-practices, a community-driven resource that provides detailed best practices for single-cell data analysis.
AnnData (Annotated Data) is the core data structure for the scverse ecosystem.
- What it is: A file format and in-memory object designed to handle large-scale matrix data with annotations.
- Why it is used: It keeps data (matrix) and metadata (cell/gene info) aligned during filtering and analysis.
- Data Model:
X: The main data matrix (cells x genes).obs: Observations (cells) metadata (e.g., cluster labels, spatial coordinates).var: Variables (genes) metadata.obsm: Multi-dimensional observations annotation (e.g., PCA, UMAP, Spatial coordinates).varm: Multi-dimensional variables annotation.uns: Unstructured data (e.g., color palettes, analysis parameters).
- Scanpy:
- Function: Fundamental single-cell analysis toolkit.
- Use Cases: Preprocessing, clustering, dimensionality reduction (UMAP/t-SNE), trajectory inference.
- Best Practice: Use for all non-spatial analysis steps.
- Squidpy:
- Function: Spatial omics analysis toolkit built on Scanpy.
- Use Cases: Spatial neighborhood enrichment, ligand-receptor interaction analysis, image feature extraction.
- Differences: Squidpy leverages the spatial coordinates and image data that Scanpy typically ignores.
scikit-image (skimage) is the standard library for image processing in Python.
- Tasks: Image filtering, segmentation, intensity normalization, and feature extraction relevant to spatial omics images.
- Fiji / CellProfiler:
- What they are: Powerful GUI-based tools for image analysis.
- When to use: For complex image segmentation pipelines or when visual inspection and manual parameter tuning are required before automated processing.
- Complementary Workflow: Use these tools to generate cell masks or ground-truth data, which can then be imported into Python/AnnData for downstream analysis.
- Commonly used plugins in Fiji:
- IJPB-Plugins (MorphoLibJ): A collection of plugins for mathematical morphology and image analysis, useful for filtering, segmentation, and geometrical measurements.
- Stardist segmentation: A deep learning-based tool for object detection and segmentation (e.g., cell nuclei) in 2D and 3D images, effective for crowded environments.
- Labkit: A user-friendly tool for pixel classification and segmentation using random forests, allowing users to train classifiers by drawing on the image.
We will analyze two primary datasets to demonstrate different workflows.
This dataset focuses on an Alzheimer's disease model.
Workflow:
- Cell type classification: Identifying neuronal and glial subtypes.
- Detecting disease-associated cell types: Pinpointing populations specific to the disease state.
- Exploring spatial distribution: Mapping where these cells are located in the tissue.
- Plaque image segmentation: Identifying amyloid plaques from imaging channels.
- Intercellular spatial relationships: Analyzing how disease-associated cells cluster around plaques.
- Quantifying tau protein per cell: Measuring pathological protein load.
- Integration with human datasets: Comparing mouse model findings to human AD data.
This dataset utilizes multi-modal data in a neurodevelopmental model.
Workflow:
- Cross-modality integration: Combining STARmap (RNA) and RIBOmap (Ribosome-bound RNA) data.
- Cell type classification: Defining cell identities based on integrated data.
- SPIN for tissue-region identification: Spatial integration to define anatomical regions.
- Differential gene expression (DGE): Finding genes that change between genotypes.
- Gene Ontology enrichment analysis: Interpreting DGE results biologically.
Effective visualization is key to communicating biological findings. Here are some popular options:
- Scanpy plotting API:
- Use Case: High-level plotting functions specifically designed for single-cell and spatial omics data (e.g.,
sc.pl.umap(),sc.pl.spatial(),sc.pl.dotplot()). Integrates seamlessly with AnnData objects.
- Use Case: High-level plotting functions specifically designed for single-cell and spatial omics data (e.g.,
- matplotlib and seaborn:
- Use Case: Static publication-quality plots (scatter plots, violins, heatmaps).
- plotly:
- Use Case: Interactive plots allowing zooming and hovering, useful for data exploration.
- napari:
- Use Case: Multi-dimensional image viewer. Essential for overlaying gene expression points on top of raw tissue images to verify registration and segmentation.