Skip to content

URLBear: use library to extract links #1342

Open
@jayvdb

Description

As we hit more corner cases for url extraction, our regex approach will be a limiting factor, and we should switch to using an external library. We would need to import our regression tests into the external library.

https://pypi.python.org/pypi/urlextract might be useful.

Or we may find a 'recursive' web scraper which handles file types of than .html.
I.e. there maybe a function in https://scrapy.org which handles plain text documents.

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions