Description
When you have links formatted like this: http://www.linkedin.com/groupAnswers?viewQuestionAndAnswers=&discussionID=12741155&gid=87954&trk=EML_anet_qa_ttle-0Pt79xs2RVr6JBpnsJt7dBpSBA)
, the ?gid=
part makes the pelican HTMLParser hiccup when truncating for feeds:
(This popped up while helping ramonsuarez on his bug #2258 )
CRITICAL: ValueError: substring not found
Traceback (most recent call last):
File "/home/me/data/pip/lib/python3.6/site-packages/pelican/utils.py", line 556, in handle_entityref
codepoint = html_entities.name2codepoint[name]
KeyError: 'gid'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/me/data/pip/bin/pelican", line 11, in <module>
sys.exit(main())
File "/home/me/data/pip/lib/python3.6/site-packages/pelican/__init__.py", line 487, in main
pelican.run()
File "/home/me/data/pip/lib/python3.6/site-packages/pelican/__init__.py", line 179, in run
p.generate_output(writer)
File "/home/me/data/pip/lib/python3.6/site-packages/pelican/generators.py", line 599, in generate_output
self.generate_feeds(writer)
File "/home/me/data/pip/lib/python3.6/site-packages/pelican/generators.py", line 300, in generate_feeds
self.settings['FEED_ALL_ATOM'])
File "/home/me/data/pip/lib/python3.6/site-packages/pelican/writers.py", line 123, in write_feed
self._add_item_to_the_feed(feed, elements[i])
File "/home/me/data/pip/lib/python3.6/site-packages/pelican/writers.py", line 52, in _add_item_to_the_feed
description = item.summary
File "/home/me/data/pip/lib/python3.6/site-packages/pelican/contents.py", line 310, in summary
return self.get_summary(self.get_siteurl())
File "/home/me/data/pip/lib/python3.6/site-packages/pelican/utils.py", line 173, in __call__
value = self.func(*args)
File "/home/me/data/pip/lib/python3.6/site-packages/pelican/contents.py", line 306, in get_summary
self.settings['SUMMARY_MAX_LENGTH'])
File "/home/me/data/pip/lib/python3.6/site-packages/pelican/utils.py", line 583, in truncate_html_words
truncator.feed(s)
File "/home/me/data/pip/lib/python3.6/site-packages/pelican/utils.py", line 484, in feed
HTMLParser.feed(self, *args, **kwargs)
File "/usr/lib/python3.6/html/parser.py", line 111, in feed
self.goahead(0)
File "/usr/lib/python3.6/html/parser.py", line 219, in goahead
self.handle_entityref(name)
File "/home/me/data/pip/lib/python3.6/site-packages/pelican/utils.py", line 558, in handle_entityref
self.handle_ref('')
File "/home/me/data/pip/lib/python3.6/site-packages/pelican/utils.py", line 543, in handle_ref
ref_end = self.rawdata.index(';', offset) + 1
ValueError: substring not found
It looks like the issue was introduced in 9d0804de7: When truncating, consider hypens, apostrophes and HTML entities.
As I do not fully understand this, @andreacorbellini do you think this simple change to use find()
instead of index()
in handle_ref()
is sufficient? I have been able to get pelican to work using this, but I don’t know whether this is the right approach.
And is &gid
a protected codepoint somehow?
Maybe @mosra , sincce you're working on something related to unescaping right now, you could have a look at this?