feedfinder.py 100% CPU usage for days.

I've been running feedfinder on the dmoz database to find and catalog RSS/Atom feeds. 

Unfortunately feedfinder keeps getting stuck in what appears to be an infinite loop on garbage html and it them requires a manual kill.

I'm fully aware of the rules of garbage in / garbage out.   But it would be convenient if feedfinder.py would realize that something is wrong and abort itself instead of processing for days on a single url.

Example:
http://gjhs.mesa.k12.co.us/

I've got about 20,000 other urls that this problem applies to as well.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feedfinder.py 100% CPU usage for days. #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feedfinder.py 100% CPU usage for days. #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions