Script has implemented Depth-First Search algorithm to save pages in Nodes
1) Results are saved in CSV/JSON file format where each row is representing one page, with the following columns/keys:
- link
- title
- number of internal links
- number of external links
- number of times url was referenced by other pages*
*if on the page there are multiple references to the one page, count it as a one
Example csv file provided in the repository. 
Example:
- To run this script run crawl command
$ python app.py crawl --page <FullURL> --format <csv/json> --output <path_to_file
Main page (5)
  subpage1 (2)
    subpage1_1 (0)
    subpage1_2 (0)
  subpage2 (1)
    subpage2_1 (0)
The subpage represents actual urls to pages and the numbers in parentheses represent the number of internal pages at the current level. 
Example:
- To run this script run print-tree command
$ python app.py print-tree --page <FullURL>
- Python 3.10
- Aiohttp 3.8.3
- Typer 0.6.1
- Numpy 1.23.3
- Unittest
- To run this project, you need to install Python then create and active virtual environment
$ python3 -m venv env
- Clone repo and install packages in requirements.txt
$ git clone https://github.com/fortyfortyy/CLI-web-crawler-script.git
$ cd ../CLI-web-crawler-script
$ pip install requirements.txt
- Go to the web_clawler_script directory
$ cd web_clawler_script
- And go back to Examples & How to run
$ cd web_clawler_script
Email: [email protected]

