Skip to content
This repository was archived by the owner on Jan 12, 2023. It is now read-only.
This repository was archived by the owner on Jan 12, 2023. It is now read-only.

Unexpected metadata for non-existent ebooks #57

@hugovk

Description

@hugovk

The metadata database contains records for non-existent ebooks.

Exists: https://www.gutenberg.org/ebooks/1
Doesn't exist: https://www.gutenberg.org/ebooks/182

Example code:

from __future__ import print_function
from gutenberg.query import get_metadata

def get_all_metadata(etextno):
    for feature_name in ['author', 'formaturi', 'language',
                         'rights', 'subject', 'title']:
        print("{}\t{}\t{}".format(
            etextno,
            feature_name,
            get_metadata(feature_name, etextno)))
    print()

get_all_metadata(1)  # US Declaration of Independence
get_all_metadata(182)  # no such ebook

Actual output:

1       author  frozenset([u'United States President (1801-1809)'])
1       formaturi       frozenset([u'http://www.gutenberg.org/ebooks/1.txt.utf-8', u'http://www.gutenberg.org/ebooks/1.e
pub.noimages', u'http://www.gutenberg.org/6/5/2/6527/6527-t/6527-t.tex', u'http://www.gutenberg.org/ebooks/1.html.noimag
es', u'http://www.gutenberg.org/files/1/1.zip', u'http://www.gutenberg.org/ebooks/1.epub.images', u'http://www.gutenberg
.org/ebooks/1.rdf', u'http://www.gutenberg.org/ebooks/1.kindle.noimages', u'http://www.gutenberg.org/files/1/1.txt', u'h
ttp://www.gutenberg.org/ebooks/1.html.images', u'http://www.gutenberg.org/6/5/2/6527/6527-t.zip', u'http://www.gutenberg
.org/ebooks/1.kindle.images'])
1       language        frozenset([u'en'])
1       rights  frozenset([u'Public domain in the USA.'])
1       subject frozenset([u'E201', u'United States. Declaration of Independence', u'United States -- History -- Revolut
ion, 1775-1783 -- Sources', u'JK'])
1       title   frozenset([u'The Declaration of Independence of the United States of America'])

182     author  frozenset([])
182     formaturi       frozenset([])
182     language        frozenset([u'en'])
182     rights  frozenset([u'None'])
182     subject frozenset([])
182     title   frozenset([])

Expected output:

I'd expect get_metadata("language", 182) and get_metadata("rights", 182) to both return frozenset([]) instead of frozenset([u'en']) and frozenset([u'None']).

Or better, as there's no such ebook, perhaps it should it return None or raise an exception: maybe IndexError or something custom like NoEbookIndex, or just don't add it to the database in the first place and let that raise whatever it would raise when an index is not found.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions