feat: use author identifiers in import API#10110
feat: use author identifiers in import API#10110cdrini merged 47 commits intointernetarchive:masterfrom
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…ver hardcoded IDs
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Open questions:
keyvsol_idin author import recordremote_idsvsidentifiersin author import record- ^ For both of these, since there are subtle differences between eg
remote_ids(authors, Dict[str, str]) andidentifiers(works/editions, Dict[str, list[str]]), I think it might be easiest if we re-use the shape of our existing open library records. Soremote_ids: dict[str,str]for authors, andkeyto hold the open library key.
- ^ For both of these, since there are subtle differences between eg
- Should any identifier conflicts cause import error?
- As a first stab, let's err on precaution, and error on any identifier conflicts.
| return authors | ||
|
|
||
| # Look for OL ID first. | ||
| if (key := author.get("ol_id")) and ( |
There was a problem hiding this comment.
We might want to name this one as key to be consistent with our book/thing records. Having the import endpoint mirror the shape of our core book records is convenient.
| if (key := author.get("ol_id")) and ( | |
| if (key := author.get("key")) and ( |
There was a problem hiding this comment.
@Freso do you have any strong opinions on this? ^
(see also Drini's comment above about remote_ids vs identifiers!)
Co-authored-by: Drini Cami <cdrini@gmail.com>
Co-authored-by: Drini Cami <cdrini@gmail.com>
Co-authored-by: Drini Cami <cdrini@gmail.com>
Co-authored-by: Drini Cami <cdrini@gmail.com>
Co-authored-by: Drini Cami <cdrini@gmail.com>
Co-authored-by: Drini Cami <cdrini@gmail.com>
Co-authored-by: Drini Cami <cdrini@gmail.com>
for more information, see https://pre-commit.ci
…tps://github.com/pidgezero-one/openlibrary into 9448/feat/use-known-author-identifiers-in-import
for more information, see https://pre-commit.ci
…tps://github.com/pidgezero-one/openlibrary into 9448/feat/use-known-author-identifiers-in-import
for more information, see https://pre-commit.ci
|
I've made the requested changes, but importing fails now with |
| """Returns the author's remote IDs merged with a given remote IDs object, as well as a count for how many IDs had conflicts. | ||
| If incoming_ids is empty, or if there are more conflicts than matches, no merge will be attempted, and the output will be (author.remote_ids, -1). | ||
| """ | ||
| output = {**self.remote_ids} |
There was a problem hiding this comment.
Ended up having to revert to this deconstruction - self.remote_ids is being treated as a Thing and not a dict, for some reason (despite every other operation on it in this codebase suggesting it should be a dict) so deepcopy fails. I'm stumped on why that's happening.
51705ac to
5bfaada
Compare
…to match type schema
878b527 to
e54bc88
Compare
e54bc88 to
4895023
Compare
There was a problem hiding this comment.
Lgtm! We tested on a call and importing is working like a charm! We've got some tweaks/fixes to the wikisource import script which we'll push up in a separate PR. Great work + perseverance on this one @pidgezero-one .
Note I decided to run with using the remote_ids / key for consistency with our type scheme, but we can always revisit.
Also decided to have match_remote_ids throw an error if there's a conflict for now, but this could be a mistake we'll want to change later 😁
This should be squash merged
Corresponding model update pr: internetarchive/openlibrary-client#419
This strictly expands the import schema.
It is not a breaking change.
Import records that don't include author IDs will continue to work as they currently do.
Closes #9448
Closes #9411
Technical
Issues:
Importing books is successful and matching authors are being found and used as expected, however navigating to the author's page from that new book's page does not show that new book on the author's page.Solr updater delay, it appeared after a while!Testing
I put the entire output of the wikisource script into /import/batch/new.
Stakeholders
@cdrini @Freso
Attribution Disclaimer: By proposing this pull request, I affirm to have made a best-effort and exercised my discretion to make sure relevant sections of this code which substantially leverage code suggestions, code generation, or code snippets from sources (e.g. Stack Overflow, GitHub) have been annotated with basic attribution so reviewers & contributors may have confidence and access to the correct context to evaluate and use this code.