Open
Description
I've been running feedfinder on the dmoz database to find and catalog RSS/Atom feeds.
Unfortunately feedfinder keeps getting stuck in what appears to be an infinite loop on garbage html and it them requires a manual kill.
I'm fully aware of the rules of garbage in / garbage out. But it would be convenient if feedfinder.py would realize that something is wrong and abort itself instead of processing for days on a single url.
Example:
http://gjhs.mesa.k12.co.us/
I've got about 20,000 other urls that this problem applies to as well.
Metadata
Metadata
Assignees
Labels
No labels