This project analyzes how COVID-19 affected crime rates in Bellingham and Seattle, Washington. The analysis uses property sales data to quantify crime's impact on housing prices at the block level.
Install dependencies:
pip install -e .Update all data sources:
python -m src.data.cli update --allCheck data status:
python -m src.data.cli statusSee docs/SCRAPER_CLI.md for complete CLI documentation.
├── data/ # Data pipeline
│ ├── 0_external/ # Third-party sources
│ ├── 1_raw/ # Original downloads
│ ├── 2_interim/ # Transformed data
│ └── 3_processed/ # Analysis-ready datasets
├── src/data/ # Unified web scraper CLI
│ ├── cli.py # Main CLI interface
│ ├── config.yaml # Scraper configuration
│ ├── scrapers/ # Scraper implementations
│ └── utils/ # Logging and helpers
├── notebooks/ # Jupyter analysis notebooks
├── models/ # Trained ML models
└── tests/ # Test suite
The unified CLI scraper provides:
- Automated data collection from three sources: Bellingham police reports, Seattle crime API, and Whatcom County property sales
- Retry logic with exponential backoff for failed requests
- Rate limiting to respect server resources
- YAML configuration for all scraper settings
- Comprehensive logging with automatic rotation
Bellingham Crime Data
- Source: City of Bellingham Police Activity Scanner
- Coverage: 2015-2024
- Output:
data/2_interim/COB_CrimeReport.csv
Seattle Crime Data
- Source: Seattle Open Data API
- Coverage: Complete historical dataset
- Output:
data/1_raw/Seattle_Crime_Data.csv
Property Sales Data
- Source: Whatcom County Assessor
- Coverage: Bellingham residential sales
- Output:
data/2_interim/Bellingham_Property_Part1.csv
CrimeData_EDA.ipynb explores 139,487 crime records from Bellingham police activity.
HousingData_EDA.ipynb merges housing and crime data, then trains ML models to quantify crime's impact on property prices.
Run tests:
pytest tests/data/ -vAdd a new scraper:
- Inherit from
BaseScraperinsrc/data/scrapers/base_scraper.py - Implement the
scrape()method - Register in
SCRAPER_CLASSESdictionary (src/data/cli.py) - Add configuration to
src/data/config.yaml - Write tests in
tests/data/scrapers/
Read the Medium analysis of COVID-19's impact on local crime statistics.
Project structure follows the cookiecutter data science template.