Skip to content

Epic: Fix importation of bogus authors and cleanup old data #331

@LeadSongDog

Description

@LeadSongDog

There are tens of thousands of bogus author records with names * Publishing or * Books. Somewhat fewer with * Editions and other-language equivalents.

Many originate with the import of low quality records from BWB or AMZ such as https://www.betterworldbooks.com/product/detail/9783110367737
which was imported as
https://openlibrary.org/books/OL34526350M/Quantenmechanik
where the authors include
https://openlibrary.org/authors/OL9711355A/Perseus_Books_Perseus_Books_LLC.

Many (about 30%) of these author records have no associated work record. Those are low-hanging fruit that could simply be bulk removed.

More have only work records that are misattributed to the “author” with these publisher names and the “publisher” shown as "Independently Published", “CreateSpace” or the like. For these there is often another correct work record of similar title showing the correct authorship. Some heuristics might help with these.

A substantial group however are corporate authorships by publisher staff writers with no public attribution to an individual. This is particularly common in bibliographies, reference works, study notes, and textbooks.

Suggestions?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions