`utils.get_pages(item)` function seems incorrect

Hey,

Thanks a lot for creating and maintaining this library, it's very useful!

I was looking into the code that generates `<nav epub:type="page-list" id="pages" hidden="hidden">` element inside the nav file, since I got a rather random entry there. 

The current code picked up this element:
```xhtml
<div id="cop" class="copyright" epub:type="copyright-page">...</div>
```
And added this:
```xhtml
# nav.xhtml
...
    <nav epub:type="page-list" id="pages" hidden="hidden">
      <h2>Pages</h2>
      <ol>
        <li>
          <a href="xhtml/blabla.xhtml#cop">cop</a>
        </li>
      </ol>
    </nav>
```

This got me to:

```python
def get_pages(item):
    body = parse_html_string(item.get_body_content())
    pages = []

    for elem in body.iter():
        if 'epub:type' in elem.attrib:
            if elem.get('id') is not None:
                _text = None
                
                if elem.text is not None and elem.text.strip() != '':
                    _text = elem.text.strip()

                if _text is None:
                    _text = elem.get('aria-label')

                if _text is None:
                    _text = get_headers(elem)

                pages.append((item.get_name(), elem.get('id'), _text or elem.get('id')))

    return pages
```

I can't understand the intention here... The `if 'epub:type' in elem.attrib` check seems to assume that anything that specifies any epub:type is of type `pagebreak`?

Seems to me a better heuristic should be to pick a first pagebreak in the chapter file and use that value (or that value - 1?) for the page value. Alternatively maybe we should add an attribute `page_list` which will work similarly to `links` or `toc` where one could specify this more manually...?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`utils.get_pages(item)` function seems incorrect #339

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

utils.get_pages(item) function seems incorrect #339

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`utils.get_pages(item)` function seems incorrect #339