Skip to content

Commit fd930a1

Browse files
Merge pull request #2 from maragkakislab/dev
Revise proteomics set and update machine learning parameters
2 parents d39ff34 + a87c436 commit fd930a1

File tree

9 files changed

+474
-4054
lines changed

9 files changed

+474
-4054
lines changed

README.md

Lines changed: 67 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,67 @@
1-
# wf-bulk-senescence
2-
Detecting cellular senescence using transcriptomic and proteomic data.
1+
# Machine learning guided identification of senescence markers
2+
3+
This repository provides a reproducible workflow for identifying robust senescence markers from SenCat transcriptomic and proteomic data and for validating marker-based scoring in external IMR90 fibroblast datasets.
4+
5+
## Workflow
6+
7+
The workflow standardizes transcriptomic and proteomic measurements, applies consistent preprocessing, and uses a cross-cell-type machine learning strategy to identify markers that remain informative across biological contexts. A refined marker set is then used to derive stable marker weights and generate sample-level senescence scores in both reference and external validation datasets, with all primary outputs written to the analysis and plotting directories.
8+
9+
## Inputs
10+
11+
- **SenCat transcriptomic data**: primary RNA-level input used for marker discovery.
12+
- **SenCat proteomic data**: primary protein-level input used for marker discovery.
13+
- **External validation data**: IMR90 fibroblast datasets used to evaluate score transferability.
14+
- **Workflow configuration**: centralized settings for inputs, analysis profiles, and output locations.
15+
16+
## ML markers
17+
18+
- **Transcriptomics markers**: `analysis/transcriptomics.transcriptomics_loose5000f_tuned_common_features.results.csv`.
19+
- **Proteomics markers**: `analysis/proteomics.proteomics_loose5000f_tuned_common_features.results.csv`.
20+
21+
## Using ML markers for senescence scoring
22+
23+
You can use our senescence markers to score your data for senescence.
24+
25+
### Prepare data
26+
27+
The scoring is performed on `h5ad` file containing normalized transcriptomics or proteomics counts. Expected `h5ad` structure:
28+
29+
- `adata.X`: sample-by-feature expression matrix
30+
- `adata.var_names`: feature identifiers matching the marker IDs in the marker CSV index
31+
32+
If your data are not normalized, you can use `normalize_counts.py` script:
33+
34+
```python
35+
python workflow/scripts/data/normalize_counts.py \
36+
--input-h5ad INPUT_H5AD \
37+
--design DESIGN_FACTORS \
38+
--output-h5ad NORMALIZED_H5AD \
39+
--log logs/my_data.normalize.log \
40+
--log-level INFO
41+
```
42+
43+
- `INPUT_H5AD` specifies a path to your input `h5ad` file
44+
- `DESIGN_FACTORS` specifies design factors for DESeq2, in the format `x + z` or `~x+z`.
45+
- `NORMALIZED_H5AD` specifies a path where your normalized data will be saved
46+
47+
## Get senescence scores
48+
49+
```python
50+
python workflow/scripts/cls/marker_classifier.py \
51+
--markers PATH_TO_ML_MARKERS \
52+
--input-h5ad NORMALIZED_H5AD \
53+
--output-results-csv OUTPUT_CSV
54+
```
55+
56+
- `PATH_TO_ML_MARKERS` specifies path to ML markers. Use `analysis/transcriptomics.transcriptomics_loose5000f_tuned_common_features.results.csv` for transcriptomics and `analysis/proteomics.proteomics_loose5000f_tuned_common_features.results.csv` for proteomics
57+
- `NORMALIZED_H5AD` specifies path to your normalized `h5ad` data
58+
- `OUTPUT_CSV` specifies path to output csv file with per-sample `score` values (higher values indicate stronger similarity to the senescence-associated signature).
59+
60+
Notes:
61+
62+
- `marker_classifier.py` applies `log1p` internally.
63+
- Marker matching is based on `adata.var_names`; non-overlapping markers are skipped automatically.
64+
65+
## Manuscript
66+
67+
Anerillas, Carlos, et al. "SenCat: Cataloging human cell senescence through multiomic profiling of multiple senescent primary cell types." bioRxiv (2026): 2026-02. [https://doi.org/10.64898/2026.02.05.703986](https://doi.org/10.64898/2026.02.05.703986)
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
obs,label,score
2+
WI38_P_1,0,-11.538828996768714
3+
WI38_P_2,0,-12.680862914726818
4+
WI38_P_3,0,-13.515741337374628
5+
WI38_P_4,0,-13.80908626869857
6+
WI38_CTIS_1,1,9.976109588163506
7+
WI38_CTIS_2,1,9.843645499837072
8+
WI38_CTIS_3,1,8.055420123720474
9+
WI38_CTIS_4,1,10.455420203602774
10+
WI38_IRIS_1,1,8.76446644078101
11+
WI38_IRIS_2,1,9.523357238727188
12+
WI38_IRIS_3,1,11.598354228476754
13+
WI38_IRIS_4,1,9.555065497539788
14+
WI38_OSIS_1,1,6.040993789963847
15+
WI38_OSIS_2,1,7.254767615045569
16+
WI38_OSIS_3,1,6.165637985157343
17+
WI38_OSIS_4,1,6.529893317067662
18+
WI38_EV_1,0,-4.782665348751444
19+
WI38_EV_2,0,-5.884829690295969
20+
WI38_EV_3,0,-6.234597061709163
21+
WI38_EV_4,0,-6.336933294992674
22+
WI38_OIS_1,1,6.799686306306363
23+
WI38_OIS_2,1,7.391797609179027
24+
WI38_OIS_3,1,5.216593771889272
25+
WI38_OIS_4,1,6.813066292960044
26+
BJ_P_1,0,-4.504449738693516
27+
BJ_P_2,0,-6.910898689808129
28+
BJ_P_3,0,-6.012232863989395
29+
BJ_P_4,0,-4.273641766054479
30+
BJ_CTIS_1,1,10.167934098654316
31+
BJ_CTIS_2,1,9.150842370496653
32+
BJ_CTIS_3,1,5.798574261490091
33+
BJ_CTIS_4,1,8.708248502686843
34+
BJ_IRIS_1,1,5.586745005682415
35+
BJ_IRIS_2,1,5.856049114520757
36+
BJ_IRIS_3,1,6.012991583916754
37+
BJ_IRIS_4,1,6.293086612028366
38+
BJ_OSIS_1,1,6.514881110180604
39+
BJ_OSIS_2,1,5.899112143500289
40+
BJ_OSIS_3,1,7.141317815028896
41+
BJ_OSIS_4,1,7.125964747596695
42+
BJ_EV_1,0,-4.7445031865403
43+
BJ_EV_2,0,-3.8630445722604465
44+
BJ_EV_3,0,-4.458858326106522
45+
BJ_EV_4,0,-6.059969375264109
46+
BJ_OIS_1,1,6.099999570754001
47+
BJ_OIS_2,1,4.652404752136463
48+
BJ_OIS_3,1,3.7548549139644627
49+
BJ_OIS_4,1,4.189128834496064
50+
HSAEC_P_1,0,-7.516757882919095
51+
HSAEC_P_2,0,-5.76287090122295
52+
HSAEC_P_3,0,-6.328329028280637
53+
HSAEC_P_4,0,-6.6986503454765485
54+
HSAEC_CTIS_1,1,8.80063410848882
55+
HSAEC_CTIS_2,1,8.743069771180759
56+
HSAEC_CTIS_3,1,8.715339636412093
57+
HSAEC_CTIS_4,1,8.386509486284409
58+
HSAEC_IRIS_1,1,8.165664558211054
59+
HSAEC_IRIS_2,1,6.90186582850553
60+
HSAEC_IRIS_3,1,5.904552204754607
61+
HSAEC_IRIS_4,1,6.3298656672416
62+
HEKn_P_1,0,-9.852561570548659
63+
HEKn_P_2,0,-9.981158177068274
64+
HEKn_P_3,0,-8.511943681036325
65+
HEKn_P_4,0,-9.540988380050143
66+
HEKn_CTIS_1,1,12.304916113481543
67+
HEKn_CTIS_2,1,11.761078708241792
68+
HEKn_CTIS_3,1,11.947732431597434
69+
HEKn_CTIS_4,1,12.544549322821881
70+
HEKn_IRIS_1,1,7.350849604945093
71+
HEKn_IRIS_2,1,6.860549755187968
72+
HEKn_IRIS_3,1,7.037442506966663
73+
HEKn_IRIS_4,1,6.806072686024458
74+
HCAEC_P_1,0,-4.157625575802478
75+
HCAEC_P_2,0,-7.8975503032894645
76+
HCAEC_P_3,0,-7.434982759396354
77+
HCAEC_P_4,0,-7.796409649740869
78+
HCAEC_CTIS_1,1,5.276929478069843
79+
HCAEC_CTIS_2,1,5.6548208491236664
80+
HCAEC_CTIS_3,1,4.857828261539319
81+
HCAEC_CTIS_4,1,6.162042762881479
82+
HCAEC_IRIS_1,1,9.292152544408026
83+
HCAEC_IRIS_2,1,8.364336711398892
84+
HCAEC_IRIS_3,1,8.228884161589338
85+
HCAEC_IRIS_4,1,8.493619551744727
86+
HUVEC_P_1,0,-7.207455334526678
87+
HUVEC_P_2,0,-7.576688905562343
88+
HUVEC_P_3,0,-6.816803012539782
89+
HUVEC_P_4,0,-6.875734140688108
90+
HUVEC_CTIS_1,1,9.137605081186898
91+
HUVEC_CTIS_2,1,10.53115220079936
92+
HUVEC_CTIS_3,1,9.046926795904163
93+
HUVEC_CTIS_4,1,9.437879822889416
94+
HUVEC_IRIS_1,1,5.9766356891366765
95+
HUVEC_IRIS_2,1,5.6979156512389775
96+
HUVEC_IRIS_3,1,5.661906263663141
97+
HUVEC_IRIS_4,1,5.675259494214629
98+
BMMSC_P_1,0,-8.176191846593678
99+
BMMSC_P_2,0,-8.279875684216606
100+
BMMSC_P_3,0,-8.545187626351273
101+
BMMSC_P_4,0,-9.002198852475061
102+
BMMSC_CTIS_1,1,5.477808479327356
103+
BMMSC_CTIS_2,1,6.636714454438904
104+
BMMSC_CTIS_3,1,5.691496435146608
105+
BMMSC_CTIS_4,1,6.4297194469528876
106+
BMMSC_IRIS_1,1,7.189968161029601
107+
BMMSC_IRIS_2,1,7.988945217501708
108+
BMMSC_IRIS_3,1,8.932761868849422
109+
BMMSC_IRIS_4,1,9.015141772948196
110+
HVSMC_P_1,0,-4.7212080471390765
111+
HVSMC_P_2,0,-4.496966667188161
112+
HVSMC_P_3,0,-5.8694982121022505
113+
HVSMC_P_4,0,-5.388814468123448
114+
HVSMC_CTIS_1,1,6.064729034752284
115+
HVSMC_CTIS_2,1,6.621491887611016
116+
HVSMC_CTIS_3,1,6.500885699925798
117+
HVSMC_CTIS_4,1,5.834681318410981
118+
HVSMC_IRIS_1,1,7.834533134124193
119+
HVSMC_IRIS_2,1,7.673343701354595
120+
HVSMC_IRIS_3,1,8.228171459940071
121+
HVSMC_IRIS_4,1,6.629298270242274
122+
HSKM_P_1,0,-8.574730734407458
123+
HSKM_P_2,0,-7.607446649837656
124+
HSKM_P_3,0,-9.379604728192938
125+
HSKM_P_4,0,-8.436637607235772
126+
HSKM_CTIS_1,1,8.891124076398677
127+
HSKM_CTIS_2,1,6.984936897178207
128+
HSKM_CTIS_3,1,7.606722861061789
129+
HSKM_CTIS_4,1,8.90944213888411
130+
HSKM_IRIS_1,1,5.225771102667557
131+
HSKM_IRIS_2,1,7.987018759582699
132+
HSKM_IRIS_3,1,7.07395764468443
133+
HSKM_IRIS_4,1,6.423080101371189
134+
PBMC_P_1,0,-11.286929069951588
135+
PBMC_P_2,0,-8.53977875547799
136+
PBMC_P_3,0,-8.475962758421906
137+
PBMC_P_4,0,-6.827553294455234
138+
PBMC_CTIS_1,1,5.187083718713232
139+
PBMC_CTIS_2,1,6.323060984657073
140+
PBMC_CTIS_3,1,5.575313663261117
141+
PBMC_CTIS_4,1,4.9551096134840416
142+
PBMC_IRIS_1,1,8.34699935401728
143+
PBMC_IRIS_2,1,8.643825535457637
144+
PBMC_IRIS_3,1,6.701784785040235
145+
PBMC_IRIS_4,1,8.898688098819003
146+
PreAdipo_P_1,0,-4.766951532411659
147+
PreAdipo_P_2,0,-4.643554862061235
148+
PreAdipo_P_3,0,-7.855656894624351
149+
PreAdipo_P_4,0,-4.5112007527617815
150+
PreAdipo_CTIS_1,1,13.099591564820118
151+
PreAdipo_CTIS_2,1,12.371837229170726
152+
PreAdipo_CTIS_3,1,8.004149908189088
153+
PreAdipo_CTIS_4,1,8.152527688585261
154+
PreAdipo_IRIS_1,1,11.335309114854754
155+
PreAdipo_IRIS_2,1,11.493084528307334
156+
PreAdipo_IRIS_3,1,11.395985110235083
157+
PreAdipo_IRIS_4,1,9.850866207148075
158+
NHO_P_1,0,-6.894779821956949
159+
NHO_P_2,0,-7.358869888964169
160+
NHO_P_3,0,-5.62518013554725
161+
NHO_P_4,0,-6.222317633289193
162+
NHO_CTIS_1,1,11.547394735704373
163+
NHO_CTIS_2,1,12.02648922404535
164+
NHO_CTIS_3,1,11.851122479222656
165+
NHO_CTIS_4,1,12.421568506098126
166+
NHO_IRIS_1,1,9.040847814888474
167+
NHO_IRIS_2,1,8.252338848223598
168+
NHO_IRIS_3,1,8.854967392843722
169+
NHO_IRIS_4,1,7.585584476958637
170+
NHA_P_1,0,-8.439846690173383
171+
NHA_P_2,0,-5.649021047198002
172+
NHA_P_3,0,-7.381991923834658
173+
NHA_P_4,0,-5.58432621825511
174+
NHA_CTIS_1,1,6.47963554567552
175+
NHA_CTIS_2,1,7.653796414126066
176+
NHA_CTIS_3,1,7.436622497961107
177+
NHA_CTIS_4,1,6.598421587689202
178+
NHA_IRIS_1,1,6.399342421257272
179+
NHA_IRIS_2,1,5.41159088667826
180+
NHA_IRIS_3,1,6.3497501330251485
181+
NHA_IRIS_4,1,7.5772660804756615
182+
HEMn_P_1,0,-4.016132218377048
183+
HEMn_P_2,0,-3.3433269142227697
184+
HEMn_P_3,0,-3.703078739193815
185+
HEMn_P_4,0,-3.7991737833017742
186+
HEMn_CTIS_1,1,7.713736200138218
187+
HEMn_CTIS_2,1,7.0118705849924785
188+
HEMn_CTIS_3,1,7.188450185221983
189+
HEMn_CTIS_4,1,6.83998595192966
190+
HEMn_IRIS_1,1,5.042549804199654
191+
HEMn_IRIS_2,1,5.108973793311555
192+
HEMn_IRIS_3,1,4.6213183403579094
193+
HEMn_IRIS_4,1,4.278458537945517

0 commit comments

Comments
 (0)