The Site Map Generator is a Python-based application with a graphical user interface (GUI) that allows users to generate a site map for a given website. It crawls through all pages within the specified domain and creates a comprehensive list of URLs, providing a clear overview of the website's structure.
- User-friendly PyQt6-based GUI
- Recursive crawling of websites within the same domain
- Progress bar to show crawling status
- Error handling for invalid URLs and network issues
- Multithreaded design for responsive UI during crawling
- Results displayed in an easy-to-read format
To run from source:
- Python 3.6 or higher
- PyQt6
- Requests
- BeautifulSoup4
To build from source:
- All of the above, plus:
- PyInstaller (e.g., PyInstaller >= 4.0)
- Ensure you have Python installed on your system.
- Install the required packages for running from source:
pip install PyQt6 requests beautifulsoup4
- To run the application, you can either run it directly from source (see Usage) or build an executable (see Building from Source).
- To run from source:
Navigate to the project directory in your terminal and execute:
python main.py
- Alternatively, after building the executable (see 'Building from Source' below):
Run the
SiteMapGenerator
executable (e.g.,SiteMapGenerator.exe
on Windows,SiteMapGenerator
on macOS/Linux) from thedist
folder. - Enter the full URL of the website (e.g.,
http://example.com
) you want to generate a site map for in the input field. - Click the "Generate Site Map" button.
- Wait for the crawling process to complete. You can monitor the progress in the progress bar and the text area below.
- Once completed, the site map will be displayed in the text area.
If you wish to build the executable from the source code:
- Ensure you have Python and the required packages (PyQt6, Requests, BeautifulSoup4) installed.
- Clone this repository or download and extract the source files.
- Install PyInstaller:
pip install pyinstaller
- Navigate to the root directory of the project in your terminal.
- Run the setup script:
python setup.py
- The executable (e.g.,
SiteMapGenerator.exe
on Windows) will be created in adist
subfolder.
- The crawler is currently set to a default maximum depth of 3 levels internally when run from the GUI (this is configurable in the
CrawlerWorker
instantiation inmain.py
). TheCrawler
class itself defaults to 5 if used programmatically. - Only pages within the exact same domain (netloc) as the initial URL are crawled. Subdomains are treated as external.
- The application may take a while to complete for very large websites.
- Error messages for specific page fetch failures are printed to the console (if running from source) or logged by PyInstaller (if running as an executable and an issue occurs), while the GUI shows a general error if the crawl cannot proceed.
- Invalid URL: Ensure you are entering a full and valid URL, including
http://
orhttps://
. - No Site Map Generated / Errors:
- Check your internet connection.
- The website might be blocking automated crawlers (check for 403 errors in console if running from source).
- The website might not have any crawlable internal links or might be a single-page application not easily navigable by this type of crawler.
- Application Freezes (Unlikely): The application uses multithreading to keep the GUI responsive. If it appears to freeze, check for system resource issues or very high network latency.
- Building Executable Fails:
- Ensure PyInstaller is installed correctly and is the latest version (
pip install --upgrade pyinstaller
). - Verify all dependencies (
PyQt6
,requests
,beautifulsoup4
) are installed in the environment where you are runningpython setup.py
. - Consult the output from
setup.py
for specific error messages from PyInstaller.
- Ensure PyInstaller is installed correctly and is the latest version (
Contributions to improve the Site Map Generator are welcome. Please feel free to submit pull requests or create issues for bugs and feature requests.
This project is open-source and available under the MIT License.
Contributions to improve the Site Map Generator are welcome. Please feel free to submit pull requests or create issues for bugs and feature requests.
This project is open-source and available under the MIT License.