This repository was archived by the owner on Jan 12, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 60
This repository was archived by the owner on Jan 12, 2023. It is now read-only.
Unexpected metadata for non-existent ebooks #57
Copy link
Copy link
Open
Labels
Description
The metadata database contains records for non-existent ebooks.
Exists: https://www.gutenberg.org/ebooks/1
Doesn't exist: https://www.gutenberg.org/ebooks/182
Example code:
from __future__ import print_function
from gutenberg.query import get_metadata
def get_all_metadata(etextno):
for feature_name in ['author', 'formaturi', 'language',
'rights', 'subject', 'title']:
print("{}\t{}\t{}".format(
etextno,
feature_name,
get_metadata(feature_name, etextno)))
print()
get_all_metadata(1) # US Declaration of Independence
get_all_metadata(182) # no such ebookActual output:
1 author frozenset([u'United States President (1801-1809)'])
1 formaturi frozenset([u'http://www.gutenberg.org/ebooks/1.txt.utf-8', u'http://www.gutenberg.org/ebooks/1.e
pub.noimages', u'http://www.gutenberg.org/6/5/2/6527/6527-t/6527-t.tex', u'http://www.gutenberg.org/ebooks/1.html.noimag
es', u'http://www.gutenberg.org/files/1/1.zip', u'http://www.gutenberg.org/ebooks/1.epub.images', u'http://www.gutenberg
.org/ebooks/1.rdf', u'http://www.gutenberg.org/ebooks/1.kindle.noimages', u'http://www.gutenberg.org/files/1/1.txt', u'h
ttp://www.gutenberg.org/ebooks/1.html.images', u'http://www.gutenberg.org/6/5/2/6527/6527-t.zip', u'http://www.gutenberg
.org/ebooks/1.kindle.images'])
1 language frozenset([u'en'])
1 rights frozenset([u'Public domain in the USA.'])
1 subject frozenset([u'E201', u'United States. Declaration of Independence', u'United States -- History -- Revolut
ion, 1775-1783 -- Sources', u'JK'])
1 title frozenset([u'The Declaration of Independence of the United States of America'])
182 author frozenset([])
182 formaturi frozenset([])
182 language frozenset([u'en'])
182 rights frozenset([u'None'])
182 subject frozenset([])
182 title frozenset([])
Expected output:
I'd expect get_metadata("language", 182) and get_metadata("rights", 182) to both return frozenset([]) instead of frozenset([u'en']) and frozenset([u'None']).
Or better, as there's no such ebook, perhaps it should it return None or raise an exception: maybe IndexError or something custom like NoEbookIndex, or just don't add it to the database in the first place and let that raise whatever it would raise when an index is not found.