Skip to content

Commit bc7a6f1

Browse files
committed
VersionChangelog
Signed-off-by: MikeMeliz <[email protected]>
1 parent 882a9af commit bc7a6f1

File tree

1 file changed

+25
-5
lines changed

1 file changed

+25
-5
lines changed

README.md

+25-5
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
-->
66
# TorCrawl.py
77

8-
[![Version](https://img.shields.io/badge/version-1.0-green.svg?style=plastic)]() [![license](https://img.shields.io/github/license/MikeMeliz/TorCrawl.py.svg?style=plastic)]()
8+
[![Version](https://img.shields.io/badge/version-1.2-green.svg?style=plastic)]() [![Python](https://img.shields.io/badge/python-v3-blue.svg?style=plastic)]() [![license](https://img.shields.io/github/license/MikeMeliz/TorCrawl.py.svg?style=plastic)]()
99

1010
## Basic Information:
1111
TorCrawl.py is a python script to crawl and extract (regular or onion) webpages through TOR network.
@@ -21,12 +21,23 @@ If you are a terminal maniac you know that things have to be simple and clear. P
2121

2222
With a single argument you can read an .onion webpage or a regular one through TOR Network and using pipes you can pass the output at any other tool you prefer.
2323

24-
![ExtractAndGrep](https://cloud.githubusercontent.com/assets/9204902/21080715/c34511ca-bfbe-11e6-9fec-230e6430d5dc.png)
24+
```shell
25+
$ torcrawl -u http://www.github.com/ | grep 'google-analytics'
26+
<meta-name="google-analytics" content="UA-XXXXXX- ">
27+
```
2528

2629
If you want to crawl the links of a webpage use the `-c` and **BAM** you got on a file all the inside links. You can even use `-d` to crawl them and so on. As far, there is also the necessary argument `-p` to wait some seconds before the next crawl.
2730

28-
![CrawlwDepthwPause](https://cloud.githubusercontent.com/assets/9204902/21080526/f2b80908-bfb9-11e6-8bc0-fd3eebe182cc.png)
29-
31+
```shell
32+
$ torcrawl -v -u http://www.github.com/ -c -d 2 -p 2
33+
# TOR is ready!
34+
# URL: http://www.github.com/
35+
# Your IP: XXX.XXX.XXX.XXX
36+
# Crawler Started from http://www.github.com/ with step 2 and wait 2
37+
# Step 1 completed with: 11 results
38+
# Step 2 completed with: 112 results
39+
# File created on /path/to/project/links.txt
40+
```
3041

3142
## Installation:
3243
To install this script, you need to clone that repository:
@@ -59,7 +70,7 @@ arg | Long | Description
5970
-c |--crawl| Crawl website (Default output on /links.txt)
6071
-d |--cdepth| Set depth of crawl's travel (Default: 1)
6172
-p |--pause| The length of time the crawler will pause (Default: 0)
62-
-l |--log| A save log will let you see which URLs were visited
73+
-l |--log| Log file with visited URLs and their response code
6374

6475
## Usage:
6576

@@ -153,3 +164,12 @@ Feel free to contribute on this project! Just fork it, make any change on your f
153164

154165
## License:
155166
“GPL” stands for “General Public License”. Using the GNU GPL will require that all the released improved versions be free software. [source & more](https://www.gnu.org/licenses/gpl-faq.html)
167+
168+
## Changelog:
169+
```
170+
v1.2:
171+
* Migrated to Python3
172+
* Option to generate log file (-l)
173+
* PEP8 Fixes
174+
* Fix double folder generation (http:// domain.com)
175+
```

0 commit comments

Comments
 (0)