Skip to content

Commit f743a1a

Browse files
Update README.md
1 parent 7b5d873 commit f743a1a

File tree

1 file changed

+17
-12
lines changed

1 file changed

+17
-12
lines changed

README.md

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ If there are any other tutorials of interest feel free to raise an issue!
201201

202202
## Background
203203

204-
SPECTRA is from a preprint, for more information on the preprint, the method behind SPECTRA, and the initials studies conducted with SPECTRA, check out the paper folder.
204+
SPECTRA is [published](rdcu.be/d2D0z) in Nature Machine Intelligence. For more code about the method behind SPECTRA and the initials studies conducted with SPECTRA, check out the paper folder.
205205

206206
## Discussion and Development
207207

@@ -223,15 +223,15 @@ All development discussions take place on GitHub in this repo in the issue track
223223

224224
2. *I have a foundation model that is pre-trained on a large amount of data. It is not feasible to do pairwise calculations of SPECTRA properties. How can I use SPECTRA?*
225225

226-
It is still possible to run SPECTRA on the foundation model (FM) and the evaluation dataset. You would use SPECTRA on the evaluation dataset then train and evaluate the foundation model on each SPECTRA split (either through linear probing, fine-tuning, or any other strategy) to calculate the AUSPC. Then you would determine the cross-split overlap between the pre-training dataset and the evaluation dataset. You would repeat this for multiple evaluation datasets, until you could plot FM AUSPC versus cross-split overlap to the evaluation dataset. For more details on what this would look like check out the [publication](https://www.biorxiv.org/content/10.1101/2024.02.25.581982v1), specifically section 5 of the results section. If there is large interest in this FAQ I can release a tutorial on this, just raise an issue!
226+
It is still possible to run SPECTRA on the foundation model (FM) and the evaluation dataset. You would use SPECTRA on the evaluation dataset then train and evaluate the foundation model on each SPECTRA split (either through linear probing, fine-tuning, or any other strategy) to calculate the AUSPC. Then you would determine the cross-split overlap between the pre-training dataset and the evaluation dataset. You would repeat this for multiple evaluation datasets, until you could plot FM AUSPC versus cross-split overlap to the evaluation dataset. For more details on what this would look like check out the [publication](rdcu.be/d2D0z), specifically section 5 of the results section. If there is large interest in this FAQ I can release a tutorial on this, just raise an issue!
227227

228228
3. *I have a foundation model that is pre-trained on a large amount of data and **I do not have access to the pre-training data**. How can I use SPECTRA?*
229229

230230
This is a bit more tricky but there are [recent publications](https://arxiv.org/abs/2402.03563) that show these foundation models can represent uncertainty in the hidden representations they produce and a model can be trained to predict uncertainty from these representations. This uncertainty could represent the spectral property comparison between the pre-training and evaluation datasets. Though more work needs to be done, porting this work over would allow the application of SPECTRA in these settings. Again if there is large interest in this FAQ I can release a tutorial on this, just raise an issue!
231231

232232
4. *SPECTRA takes a long time to run is it worth it?*
233233

234-
The pairwise spectral property comparison is computationally expensive, but only needs to be done once. Generated SPECTRA splits are important resources that should be released to the public so others can utlilize them without spending resources. For more details on the runtime of the method check out the [publication](https://www.biorxiv.org/content/10.1101/2024.02.25.581982v1), specifically section 6 of the results section. The computation can be sped up with cpu cores, which is a feature that will be released.
234+
The pairwise spectral property comparison is computationally expensive, but only needs to be done once. Generated SPECTRA splits are important resources that should be released to the public so others can utlilize them without spending resources. For more details on the runtime of the method check out the [publication](rdcu.be/d2D0z), specifically section 6 of the results section. The computation can be sped up with cpu cores, which is a feature that will be released.
235235

236236
If there are any other questions please raise them in the issues and I can address them. I'll keep adding to the FAQ as common questions begin to surface.
237237

@@ -244,15 +244,20 @@ SPECTRA is under the MIT license found in the LICENSE file in this GitHub reposi
244244
Please cite this paper when referring to SPECTRA.
245245

246246
```
247-
@article {spectra,
248-
author = {Yasha Ektefaie and Andrew Shen and Daria Bykova and Maximillian Marin and Marinka Zitnik and Maha R Farhat},
249-
title = {Evaluating generalizability of artificial intelligence models for molecular datasets},
250-
elocation-id = {2024.02.25.581982},
251-
year = {2024},
252-
doi = {10.1101/2024.02.25.581982},
253-
URL = {https://www.biorxiv.org/content/early/2024/02/28/2024.02.25.581982},
254-
eprint = {https://www.biorxiv.org/content/early/2024/02/28/2024.02.25.581982.full.pdf},
255-
journal = {bioRxiv}
247+
@ARTICLE{Ektefaie2024,
248+
title = "Evaluating generalizability of artificial intelligence models
249+
for molecular datasets",
250+
author = "Ektefaie, Yasha and Shen, Andrew and Bykova, Daria and Marin,
251+
Maximillian G and Zitnik, Marinka and Farhat, Maha",
252+
journal = "Nat. Mach. Intell.",
253+
publisher = "Springer Science and Business Media LLC",
254+
volume = 6,
255+
number = 12,
256+
pages = "1512--1524",
257+
month = dec,
258+
year = 2024,
259+
copyright = "https://www.springernature.com/gp/researchers/text-and-data-mining",
260+
language = "en"
256261
}
257262
```
258263

0 commit comments

Comments
 (0)