Skip to content

Conversation

@dgoiko
Copy link

@dgoiko dgoiko commented Jan 24, 2020

Fixes #425 by creating Matchers that throw RuntimeExceptions on timeout and a TimeoutablePathRule that extends PathRule that uses them.

The default behaviour of the system is not to use them, however, it can be enabled via RobotstxtConfig.

NOTE: The code for the timeoutable Matches is based on this stackoverflow answer and it decreases performance of regexp. The ideal thing should be to include a native efficient and timeoutable regex library, but this is a valid workaround

TimeoutablePathRule adds support to timeout Regexp that run for too long. You can configure them to consider that a timeout means a match, or a not match.

Personally I'd throw RegexpTimeoutException, but it may break some foreign subclasses, so I decided to stick to return false. The static version, matchesRobotsPattern throws RegexpTimeoutException if configured to fail on timeout since it will not break any existing code.
UserAgentDirectives  creates TimeoutablePathRule if configured to do so
Timeout for regexp is now configurable in RobotstxtConfig.

RobotstxtParser passes the RobotstxtConfig arguments to use TimeoutablePathRule if necesary.

Style fix in TimeoutablePathRule
@dgoiko dgoiko changed the title Timeoutable regex Timeoutable regular expressions in RobotstxtServer Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Exponential backtracking in regex blocks Thread

1 participant