Skip to content

Add support for multiple labels for each data source #428

@gaurav

Description

@gaurav

I think this might be overwriting the NCBIGene labels download at least, and probably other sources as well.

https://github.com/TranslatorSRI/Babel/blob/c5fa4e9e44e5fbb71674ce62c4d75c99b178aef4/src/datahandlers/obo.py#L26-L30

There are three possible solutions here:

  1. Concatenate labels into the same file, but (1) require a lockfile to make sure the file is only written by a single thread at one time, and (2) make sure that it's deleted if Babel is re-run, which will be very hard to guarantee.
  2. Concentrate OBO labels into a single directory, but that will make it difficult to debug where a particular label is coming from and would be an inelegant special-case -- if we later want to add another label source for a single source, we won't be able to do that.
  3. Change labels into a directory -- we can add whatever we want to that directory separately and use all of those labels when producing the final nodes. The only catch is that we won't be able to confirm whether all the labels file have been generated -- we'll have to check their existence separately (e.g. with some kind of "ubergraph-labels-downloaded" check file).
  4. Concentrate labels, synonyms and descriptions into a single SQLite database for each prefix (node-properties.db). SQLite can handle multiple writes without getting confused, and this would be the start of the node property database. We could still use the labels, synonyms and descriptions file -- the only thing changing here is that Ubergraph (and any other "general" label/synonym/description changers) would only modify the node-properties.db file. So we guarantee that each

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions