Skip to content

Commit b481f91

Browse files
NUTCH-3083 Add RobotRulesParser to bin/nutch
Add command *robotsparser* to bin/nutch, invoking the main method of org.apache.nutch.protocol.RobotRulesParser
1 parent 5263b7c commit b481f91

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

src/bin/nutch

+3
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@ if [ $# = 0 ]; then
8686
echo " indexchecker check the indexing filters for a given url"
8787
echo " filterchecker check url filters for a given url"
8888
echo " normalizerchecker check url normalizers for a given url"
89+
echo " robotsparser parse a robots.txt file and check whether urls are allowed or not"
8990
echo " domainstats calculate domain statistics from crawldb"
9091
echo " protocolstats calculate protocol status code stats from crawldb"
9192
echo " crawlcomplete calculate crawl completion stats from crawldb"
@@ -268,6 +269,8 @@ elif [ "$COMMAND" = "filterchecker" ] ; then
268269
CLASS=org.apache.nutch.net.URLFilterChecker
269270
elif [ "$COMMAND" = "normalizerchecker" ] ; then
270271
CLASS=org.apache.nutch.net.URLNormalizerChecker
272+
elif [ "$COMMAND" = "robotsparser" ] ; then
273+
CLASS=org.apache.nutch.protocol.RobotRulesParser
271274
elif [ "$COMMAND" = "domainstats" ] ; then
272275
CLASS=org.apache.nutch.util.DomainStatistics
273276
elif [ "$COMMAND" = "protocolstats" ] ; then

0 commit comments

Comments
 (0)