handle_ref() cannot handle &gid=

When you have links formatted like this: `http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=&discussionID=12741155&gid=87954&trk=EML_anet_qa_ttle-0Pt79xs2RVr6JBpnsJt7dBpSBA)`, the `?gid=` part makes the pelican HTMLParser hiccup when truncating for feeds:
(This popped up while helping ramonsuarez on his bug #2258 )

```
CRITICAL: ValueError: substring not found
Traceback (most recent call last):
  File "/home/me/data/pip/lib/python3.6/site-packages/pelican/utils.py", line 556, in handle_entityref
    codepoint = html_entities.name2codepoint[name]
KeyError: 'gid'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/me/data/pip/bin/pelican", line 11, in <module>
    sys.exit(main())
  File "/home/me/data/pip/lib/python3.6/site-packages/pelican/__init__.py", line 487, in main
    pelican.run()
  File "/home/me/data/pip/lib/python3.6/site-packages/pelican/__init__.py", line 179, in run
    p.generate_output(writer)
  File "/home/me/data/pip/lib/python3.6/site-packages/pelican/generators.py", line 599, in generate_output
    self.generate_feeds(writer)
  File "/home/me/data/pip/lib/python3.6/site-packages/pelican/generators.py", line 300, in generate_feeds
    self.settings['FEED_ALL_ATOM'])
  File "/home/me/data/pip/lib/python3.6/site-packages/pelican/writers.py", line 123, in write_feed
    self._add_item_to_the_feed(feed, elements[i])
  File "/home/me/data/pip/lib/python3.6/site-packages/pelican/writers.py", line 52, in _add_item_to_the_feed
    description = item.summary
  File "/home/me/data/pip/lib/python3.6/site-packages/pelican/contents.py", line 310, in summary
    return self.get_summary(self.get_siteurl())
  File "/home/me/data/pip/lib/python3.6/site-packages/pelican/utils.py", line 173, in __call__
    value = self.func(*args)
  File "/home/me/data/pip/lib/python3.6/site-packages/pelican/contents.py", line 306, in get_summary
    self.settings['SUMMARY_MAX_LENGTH'])
  File "/home/me/data/pip/lib/python3.6/site-packages/pelican/utils.py", line 583, in truncate_html_words
    truncator.feed(s)
  File "/home/me/data/pip/lib/python3.6/site-packages/pelican/utils.py", line 484, in feed
    HTMLParser.feed(self, *args, **kwargs)
  File "/usr/lib/python3.6/html/parser.py", line 111, in feed
    self.goahead(0)
  File "/usr/lib/python3.6/html/parser.py", line 219, in goahead
    self.handle_entityref(name)
  File "/home/me/data/pip/lib/python3.6/site-packages/pelican/utils.py", line 558, in handle_entityref
    self.handle_ref('')
  File "/home/me/data/pip/lib/python3.6/site-packages/pelican/utils.py", line 543, in handle_ref
    ref_end = self.rawdata.index(';', offset) + 1
ValueError: substring not found
```
It looks like the issue was introduced in [9d0804de7: When truncating, consider hypens, apostrophes and HTML entities. ](https://github.com/getpelican/pelican/commit/9d0804de7)

As I do not fully understand this, @andreacorbellini do you think [this simple change](https://github.com/ix5/pelican/commit/239b072e61c0da1fd658b7830bf93f9d821d1fd4) to use `find()` instead of `index()` in `handle_ref()` is sufficient? I have been able to get pelican to work using this, but I don’t know whether this is the right approach.

And is `&gid` a protected codepoint somehow?

Maybe @mosra , sincce you're working on something related to unescaping [right now](https://github.com/getpelican/pelican/pull/2260), you could have a look at this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

handle_ref() cannot handle &gid= #2263

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

handle_ref() cannot handle &gid= #2263

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions