You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+17-12Lines changed: 17 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -201,7 +201,7 @@ If there are any other tutorials of interest feel free to raise an issue!
201
201
202
202
## Background
203
203
204
-
SPECTRA is from a preprint, for more information on the preprint, the method behind SPECTRA, and the initials studies conducted with SPECTRA, check out the paper folder.
204
+
SPECTRA is [published](rdcu.be/d2D0z) in Nature Machine Intelligence. For more code about the method behind SPECTRA and the initials studies conducted with SPECTRA, check out the paper folder.
205
205
206
206
## Discussion and Development
207
207
@@ -223,15 +223,15 @@ All development discussions take place on GitHub in this repo in the issue track
223
223
224
224
2.*I have a foundation model that is pre-trained on a large amount of data. It is not feasible to do pairwise calculations of SPECTRA properties. How can I use SPECTRA?*
225
225
226
-
It is still possible to run SPECTRA on the foundation model (FM) and the evaluation dataset. You would use SPECTRA on the evaluation dataset then train and evaluate the foundation model on each SPECTRA split (either through linear probing, fine-tuning, or any other strategy) to calculate the AUSPC. Then you would determine the cross-split overlap between the pre-training dataset and the evaluation dataset. You would repeat this for multiple evaluation datasets, until you could plot FM AUSPC versus cross-split overlap to the evaluation dataset. For more details on what this would look like check out the [publication](https://www.biorxiv.org/content/10.1101/2024.02.25.581982v1), specifically section 5 of the results section. If there is large interest in this FAQ I can release a tutorial on this, just raise an issue!
226
+
It is still possible to run SPECTRA on the foundation model (FM) and the evaluation dataset. You would use SPECTRA on the evaluation dataset then train and evaluate the foundation model on each SPECTRA split (either through linear probing, fine-tuning, or any other strategy) to calculate the AUSPC. Then you would determine the cross-split overlap between the pre-training dataset and the evaluation dataset. You would repeat this for multiple evaluation datasets, until you could plot FM AUSPC versus cross-split overlap to the evaluation dataset. For more details on what this would look like check out the [publication](rdcu.be/d2D0z), specifically section 5 of the results section. If there is large interest in this FAQ I can release a tutorial on this, just raise an issue!
227
227
228
228
3.*I have a foundation model that is pre-trained on a large amount of data and **I do not have access to the pre-training data**. How can I use SPECTRA?*
229
229
230
230
This is a bit more tricky but there are [recent publications](https://arxiv.org/abs/2402.03563) that show these foundation models can represent uncertainty in the hidden representations they produce and a model can be trained to predict uncertainty from these representations. This uncertainty could represent the spectral property comparison between the pre-training and evaluation datasets. Though more work needs to be done, porting this work over would allow the application of SPECTRA in these settings. Again if there is large interest in this FAQ I can release a tutorial on this, just raise an issue!
231
231
232
232
4.*SPECTRA takes a long time to run is it worth it?*
233
233
234
-
The pairwise spectral property comparison is computationally expensive, but only needs to be done once. Generated SPECTRA splits are important resources that should be released to the public so others can utlilize them without spending resources. For more details on the runtime of the method check out the [publication](https://www.biorxiv.org/content/10.1101/2024.02.25.581982v1), specifically section 6 of the results section. The computation can be sped up with cpu cores, which is a feature that will be released.
234
+
The pairwise spectral property comparison is computationally expensive, but only needs to be done once. Generated SPECTRA splits are important resources that should be released to the public so others can utlilize them without spending resources. For more details on the runtime of the method check out the [publication](rdcu.be/d2D0z), specifically section 6 of the results section. The computation can be sped up with cpu cores, which is a feature that will be released.
235
235
236
236
If there are any other questions please raise them in the issues and I can address them. I'll keep adding to the FAQ as common questions begin to surface.
237
237
@@ -244,15 +244,20 @@ SPECTRA is under the MIT license found in the LICENSE file in this GitHub reposi
244
244
Please cite this paper when referring to SPECTRA.
245
245
246
246
```
247
-
@article {spectra,
248
-
author = {Yasha Ektefaie and Andrew Shen and Daria Bykova and Maximillian Marin and Marinka Zitnik and Maha R Farhat},
249
-
title = {Evaluating generalizability of artificial intelligence models for molecular datasets},
0 commit comments