- Crawl Websites: Recursively crawl a website starting from a given URL.
- Extract Links: Extract all internal links from HTML pages.
- Generate Reports: Generate a sorted report of pages based on the number of inbound links.
crawl(baseURL, currentURL, pages)
Crawls a website starting from currentURL and returns a report of all internal pages.
baseURL: The base URL of the website (e.g.,https://blog.boot.dev).currentURL: The current URL to crawl (e.g.,https://blog.boot.dev/about).pages: An object to track visited pages and their inbound links.
Note
default is set to start from the baseURL if no start point is sepcified
- clone the repo and install the dependencies
git clone https://github.com/habibayman/web-crawler
npm i- run using a sample URL
node app.js https://www.wagslane.dev
npm test