This repository provides a reproducible workflow for identifying robust senescence markers from SenCat transcriptomic and proteomic data and for validating marker-based scoring in external IMR90 fibroblast datasets.
The workflow standardizes transcriptomic and proteomic measurements, applies consistent preprocessing, and uses a cross-cell-type machine learning strategy to identify markers that remain informative across biological contexts. A refined marker set is then used to derive stable marker weights and generate sample-level senescence scores in both reference and external validation datasets, with all primary outputs written to the analysis and plotting directories.
- SenCat transcriptomic data: primary RNA-level input used for marker discovery.
- SenCat proteomic data: primary protein-level input used for marker discovery.
- External validation data: IMR90 fibroblast datasets used to evaluate score transferability.
- Workflow configuration: centralized settings for inputs, analysis profiles, and output locations.
- Transcriptomics markers:
analysis/transcriptomics.transcriptomics_loose5000f_tuned_common_features.results.csv. - Proteomics markers:
analysis/proteomics.proteomics_loose5000f_tuned_common_features.results.csv.
You can use our senescence markers to score your data for senescence.
The scoring is performed on h5ad file containing normalized transcriptomics or proteomics counts. Expected h5ad structure:
adata.X: sample-by-feature expression matrixadata.var_names: feature identifiers matching the marker IDs in the marker CSV index
If your data are not normalized, you can use normalize_counts.py script:
python workflow/scripts/data/normalize_counts.py \
--input-h5ad INPUT_H5AD \
--design DESIGN_FACTORS \
--output-h5ad NORMALIZED_H5AD \
--log logs/my_data.normalize.log \
--log-level INFOINPUT_H5ADspecifies a path to your inputh5adfileDESIGN_FACTORSspecifies design factors for DESeq2, in the formatx + zor~x+z.NORMALIZED_H5ADspecifies a path where your normalized data will be saved
python workflow/scripts/cls/marker_classifier.py \
--markers PATH_TO_ML_MARKERS \
--input-h5ad NORMALIZED_H5AD \
--output-results-csv OUTPUT_CSV PATH_TO_ML_MARKERSspecifies path to ML markers. Useanalysis/transcriptomics.transcriptomics_loose5000f_tuned_common_features.results.csvfor transcriptomics andanalysis/proteomics.proteomics_loose5000f_tuned_common_features.results.csvfor proteomicsNORMALIZED_H5ADspecifies path to your normalizedh5addataOUTPUT_CSVspecifies path to output csv file with per-samplescorevalues (higher values indicate stronger similarity to the senescence-associated signature).
Notes:
marker_classifier.pyapplieslog1pinternally.- Marker matching is based on
adata.var_names; non-overlapping markers are skipped automatically.
Anerillas, Carlos, et al. "SenCat: Cataloging human cell senescence through multiomic profiling of multiple senescent primary cell types." bioRxiv (2026): 2026-02. https://doi.org/10.64898/2026.02.05.703986