Skip to content

Commit 6c70246

Browse files
committed
Add ISBN ocaid extractor bot
1 parent cbeb30b commit 6c70246

File tree

4 files changed

+107
-0
lines changed

4 files changed

+107
-0
lines changed

isbnfromiabot/README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
A set of scripts to add isbn_13 values to editions with IA/ocaid references containing one.
2+
### How To Use
3+
```bash
4+
# Find Editions with IA ISBN, but no ISBN 13
5+
./find_editions_with_isbnianot13.sh /path/to/ol_dump.txt.gz /path/to/filtered_dump.txt.gz
6+
# Add ISBN 13s converted from the ia ocaid source
7+
python isbn_ia_to_13.py --dump_path=/path/to/filtered_dump.txt.gz --dry_run=<bool> --limit=<init>
8+
```
9+
If `dry_run` is True, the script will run as normal, but no changes will be saved to OpenLibrary.
10+
This is for debugging purposes. By default, `dry_run` is `True`.
11+
`limit` is the maximum number of changes to OpenLibrary that will occur before the script quits.
12+
By default, `limit` is set to `1`. Setting `limit` to `0` allows unlimited edits.
13+
A log is automatically generated whenever `isbn_ia_to_13.py` executes.
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
#!/bin/bash
2+
3+
if [[ -z $1 ]]
4+
then
5+
echo "No dump file provided"
6+
exit 1
7+
fi
8+
if [[ -z $2 ]]
9+
then
10+
echo "No output file provided"
11+
exit 1
12+
fi
13+
14+
OL_DUMP=$1
15+
OUTPUT=$2
16+
17+
zgrep ^/type/edition $OL_DUMP |
18+
grep -E '"ia:isbn_\d{13}"' |
19+
grep -v -E '"isbn_13":' |
20+
grep -v -E '"isbn_10"' |
21+
pv |
22+
gzip > $OUTPUT

isbnfromiabot/isbn_ia_to_13.py

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
"""
2+
BWB isbn ref to isbn 13
3+
NOTE: This script ideally works on an Open Library Dump that only contains editions with an BWB isbn ref and no isbn_13
4+
"""
5+
import gzip
6+
import json
7+
import re
8+
9+
import isbnlib
10+
import olclient
11+
12+
13+
class ConvertISBNiato13Job(olclient.AbstractBotJob):
14+
def run(self) -> None:
15+
"""Looks for any IA ISBN to convert to 13"""
16+
self.write_changes_declaration()
17+
header = {"type": 0, "key": 1, "revision": 2, "last_modified": 3, "JSON": 4}
18+
comment = "extract ISBN 13 from IA source_record"
19+
with gzip.open(self.args.file, "rb") as fin:
20+
for row_num, row in enumerate(fin):
21+
row = row.decode().split("\t")
22+
_json = json.loads(row[header["JSON"]])
23+
if _json["type"]["key"] != "/type/edition":
24+
continue
25+
26+
if hasattr(_json, "isbn_13"):
27+
# we only update editions with no existing isbn 13s (for now at least)
28+
continue
29+
30+
if "source_records" in _json:
31+
source_records = _json.get("source_records", None)
32+
else:
33+
continue
34+
regex = "ia:isbn_[0-9]{13}"
35+
isbn_13 = False
36+
for source_record in source_records:
37+
if re.fullmatch(regex, source_record):
38+
isbn_13 = source_record[8:]
39+
break
40+
41+
if not isbn_13:
42+
continue
43+
44+
if not isbnlib.is_isbn13(isbn_13):
45+
continue
46+
47+
olid = _json["key"].split("/")[-1]
48+
edition = self.ol.Edition.get(olid)
49+
if edition.type["key"] != "/type/edition":
50+
continue
51+
52+
if hasattr(edition, "isbn_13"):
53+
# don't update editions that already have an isbn 13
54+
continue
55+
56+
isbns_13 = [isbn_13]
57+
58+
setattr(edition, "isbn_13", isbns_13)
59+
self.logger.info("\t".join([olid, source_record, str(isbns_13)]))
60+
self.save(lambda: edition.save(comment=comment))
61+
62+
63+
if __name__ == "__main__":
64+
job = ConvertISBNiato13Job()
65+
66+
try:
67+
job.run()
68+
except Exception as e:
69+
job.logger.exception(e)
70+
raise e

isbnfromiabot/requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
openlibrary-client==0.0.30
2+
isbnlib==3.10.14

0 commit comments

Comments
 (0)