I work for a university library, and I'm exploring using dedupe for identifying partner MARC records that are duplicates of our existing records.
I've had some success, using dedupe in combination with the pymarc library.
My issue is that I don't think I can get buy-in to use this work in production without more transparency about what decisions the machine learning algorithm has made about the training data - basically, I want some version of the write_settings method that writes those settings in a more human-readable way, even if that human has to have some expertise in order to interpret those settings.