-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
I think this might be overwriting the NCBIGene labels download at least, and probably other sources as well.
There are three possible solutions here:
- Concatenate labels into the same file, but (1) require a lockfile to make sure the file is only written by a single thread at one time, and (2) make sure that it's deleted if Babel is re-run, which will be very hard to guarantee.
- Concentrate OBO labels into a single directory, but that will make it difficult to debug where a particular label is coming from and would be an inelegant special-case -- if we later want to add another label source for a single source, we won't be able to do that.
- This is my preferred solution for now (PR Collect common files in a single location #438)
- Change
labelsinto a directory -- we can add whatever we want to that directory separately and use all of those labels when producing the final nodes. The only catch is that we won't be able to confirm whether all thelabelsfile have been generated -- we'll have to check their existence separately (e.g. with some kind of "ubergraph-labels-downloaded" check file).- I started working on this at Change labels, synonyms and descriptions into directories #437, but I think the SQLite database approach makes more sense, so I'll abandon it for now.
- Concentrate labels, synonyms and descriptions into a single SQLite database for each prefix (
node-properties.db). SQLite can handle multiple writes without getting confused, and this would be the start of the node property database. We could still use thelabels,synonymsanddescriptionsfile -- the only thing changing here is that Ubergraph (and any other "general" label/synonym/description changers) would only modify the node-properties.db file. So we guarantee that each