Description
Here is the state of OTT matching with the current extinct tree:
Over all input files, 73.849% names were precisely matched, 4.358% matched via synonyms, and 21.792% have no match
And note that synonyms are not always reliable. For instance, Sphenacodon (a Synapsid) is treated as a synonym of Epigonus (a fish), which seems utterly random.
This results in major issues with the tree:
- Missing images
- Missing links, or link going to the wrong place when you click on the leaf
Now, thinking about solving this...
In the full tree, we start with OTT, and then use the taxonomy to get ncbi, gbif, irmng, then the provider ID CSV to get the EOL IDs, and finally the wiki dump to get QIDs.
But in the Extinct case, we start with wikipedia, and essentially get everything from there. So this complex chain is unnecessary, and results in bad behavior due to the missing OTTs.
One solution to this is to simply move away from OTTs, and instead use QIDs as our primary identifiers. To do this in a non-disruptive way that doesn't require many core code changes, we can just pretend that QIDs are OTTs. So in the DB, we'd just put QIDs wherever OTTs are used today, without even changing the schema.
In the ordered_leaves
table, this would effectively mean that the ott
and wikidata
columns would have the same value.
The reason I think this will work is that OneZoom mostly uses the OTT as a unique and stable ID for taxa, but doesn't really ever do anything that expects it to be a true OTT.
Bottom line is that we can potentially make all this work with zero changes to the core OneZoom code base. We only need to change the Extinct TreeBuild logic to create the database in this way.
I have not tried this, so let's discuss whether this might run into unexpected downsides.
Activity