Google Issue Tracker Scraper & Analyzer

A comprehensive tool designed to scrape, clean, categorize, analyze, and visualize issues from the Google Issue Tracker. This project takes a CSV input containing issue IDs or URLs, enriches it with scraped data, and provides insightful visualizations and summaries.

Features

Automated Dependency Management: Checks for required Python packages and installs them automatically if missing.
Robust Data Loading: Handles various CSV encodings (UTF-8, Latin-1, CP1252, etc.) to ensure data is loaded correctly.
Smart Duplicate Removal: Identifies and removes duplicate issues based on ID, Title, or Description.
URL Detection: Automatically detects URL columns or constructs valid Google Issue Tracker URLs from plain IDs.
Dual Scraping Modes:
- Standard Mode (main.py): Uses requests and BeautifulSoup for fast, multithreaded scraping.
- Selenium Mode (selenium_main.py): Uses a headless Chrome browser for handling dynamic content or bypassing potential blocks.
Intelligent Categorization:
- Pixel Model Detection: Identifies Pixel device models (e.g., Pixel 6, Pixel 7, Pixel 8) mentioned in the text.
- Severity Estimation: Estimates issue severity (High, Medium, Low) based on keyword analysis.
- Issue Categorization: Classifies issues into categories like Network, Hardware, UI/Graphics, Installation, etc.
Visualization & Reporting:
- Generates distribution Bar Charts and Pie Charts.
- Exports cleaned and enriched data to CSV.
- Creates summary statistics tables.
Logging: detailed logs preserved in output/logs for troubleshooting and auditing.

Prerequisites

Python 3.x
Google Chrome (required for Selenium mode)

Installation

Clone the repository:

git clone https://github.com/Dev-31/google-issue-tracker-scraper.git
cd google-issue-tracker-scraper

Install dependencies:
```
pip install -r requirements.txt
```
Note: The script also includes a self-check and will attempt to install missing dependencies on the first run.

Usage

Prepare Input:
- Place your input CSV file in the input/ directory.
- Ensure the CSV has a column with Issue IDs (e.g., ID) or URLs (e.g., url, link).
Run the Scraper:

Option A: Fast Mode (Requests) Use this for bulk scraping when the content is static.
```
python main.py
```
Option B: Browser Mode (Selenium) Use this if the standard mode fails to retrieve content or if the page requires JavaScript rendering.
```
python selenium_main.py
```

Output

After execution, check the output/ directory for results:

output/cleaned_data.csv: The fully processed dataset with scraped descriptions, labels, and categorization tags.
output/summary.csv: A high-level summary of issue counts by category.
output/charts/: Contains generated visualizations (Bar Chart, Pie Chart).
output/logs/: Detailed execution logs.

Project Structure

google-issue-tracker-scraper/
├── input/                  # Place your input CSV files here
├── output/                 # Generated results (CSV, Arrays, Logs)
│   ├── charts/             # Generated graphs
│   └── logs/               # Execution logs
├── main.py                 # Core script (Requests + Multithreading)
├── selenium_main.py        # Alternative script (Selenium WebDriver)
├── requirements.txt        # Python dependencies
└── README.md               # Project documentation

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Issue Tracker Scraper & Analyzer

Features

Prerequisites

Installation

Usage

Output

Project Structure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
selenium_main.py		selenium_main.py

Folders and files

Latest commit

History

Repository files navigation

Google Issue Tracker Scraper & Analyzer

Features

Prerequisites

Installation

Usage

Output

Project Structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages