-
Notifications
You must be signed in to change notification settings - Fork 251
Description
Hey,
Thanks a lot for creating and maintaining this library, it's very useful!
I was looking into the code that generates <nav epub:type="page-list" id="pages" hidden="hidden"> element inside the nav file, since I got a rather random entry there.
The current code picked up this element:
<div id="cop" class="copyright" epub:type="copyright-page">...</div>And added this:
# nav.xhtml
...
<nav epub:type="page-list" id="pages" hidden="hidden">
<h2>Pages</h2>
<ol>
<li>
<a href="xhtml/blabla.xhtml#cop">cop</a>
</li>
</ol>
</nav>This got me to:
def get_pages(item):
body = parse_html_string(item.get_body_content())
pages = []
for elem in body.iter():
if 'epub:type' in elem.attrib:
if elem.get('id') is not None:
_text = None
if elem.text is not None and elem.text.strip() != '':
_text = elem.text.strip()
if _text is None:
_text = elem.get('aria-label')
if _text is None:
_text = get_headers(elem)
pages.append((item.get_name(), elem.get('id'), _text or elem.get('id')))
return pagesI can't understand the intention here... The if 'epub:type' in elem.attrib check seems to assume that anything that specifies any epub:type is of type pagebreak?
Seems to me a better heuristic should be to pick a first pagebreak in the chapter file and use that value (or that value - 1?) for the page value. Alternatively maybe we should add an attribute page_list which will work similarly to links or toc where one could specify this more manually...?