A modern, open-source toolkit for extracting, cleaning, and analyzing crime data in the state of Ceará, Brazil. Crime Analytics streamlines the process of converting official PDF crime reports into structured CSV/Excel datasets and provides tools for exploratory analysis, visualization, and reporting—whether you work locally, in the cloud, or on Google Colab.
-
Automated PDF Extraction Batch-download government crime report PDFs into clean, analysis-ready CSV or Excel files.
-
Data Cleaning & Standardization Remove duplicates, handle missing values, and harmonize column names and formats.
-
Exploratory & Statistical Analysis Generate summary statistics, spot trends, and assess data quality.
-
Temporal & Spatial Insights Analyze crime patterns over time and across locations (AIS, municipalities, neighborhoods).
-
Interactive Visualizations Create charts, heatmaps, and dashboards for deeper insights.
-
Google Colab & Jupyter Support Run the full workflow interactively in notebooks.
-
Automated Reporting Export findings as text summaries, charts, and data files.
data/
├── ais_ce.geojson
├── ais.json
├── ceara.geojson
├── fortaleza-neighborhood.geojson
lib/
├── crawler.py
└── pdf_converter.py
notebooks/
└── project.ipynb
src/
├── geojson_ais.py
├── main.py
├── maps.py
└── plot.py
This project uses uv for dependency management. Install everything with:
uv syncAlternatively, you can use plain pip:
pip install -r requirements.txtThe crawler downloads PDF crime reports into the data/pdfs/ directory.
uv run lib/crawler.pyUse the PDF converter to extract structured data from the downloaded reports.
uv run lib/pdf_converter.pyThis generates data/cvli.csv.
Process the dataset, generate visualizations, and build outputs (maps, stats, charts).
uv run src/main.py- CSV/Excel datasets (
data/cvli.csv,data/ais_analysis.xlsx, etc.) - GeoJSON files (
data/ais_ce.geojson,data/ceara.geojson, etc.) - Visualizations (
data/top_ais_cvli.png,data/cvli_heatmap_ais.html) - Interactive maps for spatial exploration of crime data.
Contributions are welcome! Feel free to open issues, suggest improvements, or submit PRs to extend the toolkit.