To help us deduplicate and do various other things, it would be great to have a lot of certainty about the formal identifiers of a paper as cheaply as possible. For example can we get a paper's DOI with high reliability (e.g., even when the paper shares the title, filename, authors or other properties with other papers in our corpus).
We currently make requests to services like crossref, altmetric or openAlex for this but even those require estimating things. So we might want some general purpose feature that aims to find and check some basic bibliometrics more robustly.
I suspect this will need some iteration inside, e.g., if title matches, does abstract, or DOI or other stuff? How far do we go, and how many instances of disagreement prove to us that things are different? etc.