You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TorCrawl.py is a python script to crawl and extract (regular or onion) webpages through TOR network.
@@ -21,12 +21,23 @@ If you are a terminal maniac you know that things have to be simple and clear. P
21
21
22
22
With a single argument you can read an .onion webpage or a regular one through TOR Network and using pipes you can pass the output at any other tool you prefer.
If you want to crawl the links of a webpage use the `-c` and **BAM** you got on a file all the inside links. You can even use `-d` to crawl them and so on. As far, there is also the necessary argument `-p` to wait some seconds before the next crawl.
# Crawler Started from http://www.github.com/ with step 2 and wait 2
37
+
# Step 1 completed with: 11 results
38
+
# Step 2 completed with: 112 results
39
+
# File created on /path/to/project/links.txt
40
+
```
30
41
31
42
## Installation:
32
43
To install this script, you need to clone that repository:
@@ -59,7 +70,7 @@ arg | Long | Description
59
70
-c |--crawl| Crawl website (Default output on /links.txt)
60
71
-d |--cdepth| Set depth of crawl's travel (Default: 1)
61
72
-p |--pause| The length of time the crawler will pause (Default: 0)
62
-
-l |--log| A save log will let you see which URLs were visited
73
+
-l |--log| Log file with visited URLs and their response code
63
74
64
75
## Usage:
65
76
@@ -153,3 +164,12 @@ Feel free to contribute on this project! Just fork it, make any change on your f
153
164
154
165
## License:
155
166
“GPL” stands for “General Public License”. Using the GNU GPL will require that all the released improved versions be free software. [source & more](https://www.gnu.org/licenses/gpl-faq.html)
0 commit comments