Replies: 1 comment 2 replies
-
|
Some thoughts i have, mainly about the data we store and how do we parse it:
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all
Currently, all database files (.csv files) stores several component information, including but not limited to multiple different variations of names, CAS numbers, molecular weights, sometimes INCHI-keys, SMILES, etc. In some cases an alternative name might be updated in one sheet and not another. Or in other cases, we may have molecular weights reported with different levels of accuracy. In terms of structuring it would be better to centralize compound information into a single sheet. My idea would be to have identifiers in one column, which maps to a common IUPAC name used throughout other databases. Water for instance may have several entries in the sheet
We store these in different rows to avoid string parsing (which is currently done in the database files with
~|~markers). We use a single identifier, "water", across all other database files. Another "global" database can be stored for other constant compound information, maybe with columnsKeeping all constant compound information centralized. A lookup would then entail "User input"->Identifier->Normalized name->global information + local/model parameters.
Just my opinion, but this would make the database much cleaner and control/updating centralized. I am not sure what the implications of this choice would be on user the speed, or on user supplied databases.
Interested to hear your thoughts on this approach
Beta Was this translation helpful? Give feedback.
All reactions