Geospatial Data Quality Assessment & Interactive Profiling
Profile any geodataset with a single line of code
GeoQA is a Python package for automated quality assessment and interactive profiling of geospatial vector data. Think of it as ydata-profiling (formerly pandas-profiling) but purpose-built for geodata.
- Profile any vector dataset (Shapefile, GeoJSON, GeoPackage, etc.) with one line of code
- Validate geometry quality — invalid, empty, duplicate, and mixed-type detection
- Analyze attribute completeness, statistics, and distributions
- Visualize data on interactive maps with quality-issue highlighting
- Generate self-contained HTML quality reports with charts and tables
- Automate QA/QC workflows via CLI or Python API
GeoQA is the data-readiness gate in a two-stage geospatial quality control pipeline alongside OVC and OVC ArcGIS Pro.
flowchart LR
A["Geospatial\nData"] --> B
subgraph B["GeoQA"]
direction TB
B1["Score"]
B2["Profile"]
B3["Validate"]
B4["Report"]
end
B -- "Pre-check\nGate" --> C
B -- "Pre-check\nGate" --> E
subgraph C["OVC Python"]
C1["Road QC"]
C2["Building Overlaps"]
C3["Road Conflicts"]
end
subgraph E["OVC ArcGIS Pro"]
E1["5 Toolbox Tools"]
end
C --> D["QC Results\n& Web Maps"]
E --> D
style A fill:#fff,stroke:#e91e63,color:#000
style B fill:#2e9e49,stroke:#1b7a33,color:#fff
style B1 fill:#fff,stroke:#e91e63,color:#000
style B2 fill:#fff,stroke:#e91e63,color:#000
style B3 fill:#fff,stroke:#e91e63,color:#000
style B4 fill:#fff,stroke:#e91e63,color:#000
style C fill:#2962ff,stroke:#1a44b8,color:#fff
style C1 fill:#fff,stroke:#e91e63,color:#000
style C2 fill:#fff,stroke:#e91e63,color:#000
style C3 fill:#fff,stroke:#e91e63,color:#000
style E fill:#7c3aed,stroke:#5b21b6,color:#fff
style E1 fill:#fff,stroke:#e91e63,color:#000
style D fill:#fff,stroke:#2962ff,color:#000
| Feature | Description |
|---|---|
| One-liner profiling | geoqa.profile("data.shp") — instant dataset overview |
| Geometry validation | OGC-compliant validity checks, empty/null detection, duplicate finding |
| Attribute profiling | Data types, null analysis, unique values, descriptive statistics |
| Interactive maps | Folium-based maps with issue highlighting and quality coloring |
| HTML reports | Self-contained quality reports with charts and tables |
| CLI interface | geoqa profile data.shp — terminal access to all features |
| Auto-fix | Repair invalid geometries with profile.geometry_results |
| Spatial analysis | CRS info, extent, area/length statistics, centroid computation |
pip install geoqaFrom source (development):
git clone https://github.com/AmmarYasser455/geoqa.git
cd geoqa
pip install -e ".[dev]"Requirements: Python 3.9+ — depends on geopandas, shapely, folium, matplotlib, pandas, numpy, jinja2, click, and rich.
import geoqa
# Profile a dataset
profile = geoqa.profile("buildings.shp")
# View summary
profile.summary()
# Interactive map with issue highlighting
profile.show_map()
# Quality check details
profile.quality_checks()
# Generate HTML report
profile.to_html("quality_report.html")
# Attribute and geometry statistics
profile.attribute_stats()
profile.geometry_stats()import geopandas as gpd
import geoqa
gdf = gpd.read_file("roads.geojson")
profile = geoqa.profile(gdf, name="City Roads")
profile.summary()geoqa profile data.shp # Profile a dataset
geoqa report data.shp --output report.html # Generate HTML report
geoqa check data.geojson # Run quality checks only
geoqa show data.gpkg --output map.html # Open interactive mapGeoQA computes an overall quality score (0–100) based on:
| Component | Weight | Description |
|---|---|---|
| Geometry Validity | 40% | Percentage of valid geometries (OGC compliance) |
| Attribute Completeness | 30% | Percentage of non-null attribute values |
| CRS Defined | 15% | Whether a coordinate reference system is set |
| No Empty Geometries | 15% | Percentage of non-empty geometries |
| Check | Severity | Description |
|---|---|---|
| Geometry Validity | High | OGC Simple Features compliance |
| Empty Geometries | Medium | Geometries with no coordinates |
| Duplicate Geometries | Medium | Identical geometry pairs (WKB comparison) |
| CRS Defined | High | Coordinate reference system presence |
| Attribute Completeness | Varies | Null/missing value analysis |
| Mixed Geometry Types | Low | Multiple geometry types in one layer |
GeoQA creates interactive Folium maps with auto-reprojection to WGS84, quality highlighting (invalid in red, valid in blue), interactive tooltips, multiple basemaps, and layer controls.
profile.show_map()
# Or use the visualization API directly
from geoqa.visualization import MapVisualizer
viz = MapVisualizer(profile.gdf, name="My Data")
quality_map = viz.create_quality_map(profile.geometry_results)Generate comprehensive, self-contained HTML reports:
profile.to_html("report.html")Reports include quality score badges, dataset overview cards, quality check tables with pass/fail/warn indicators, spatial extent information, attribute completeness bars, numeric column statistics, and geometry type distributions.
All vector formats readable by GeoPandas/Fiona: Shapefile, GeoJSON, GeoPackage, KML, GML, CSV with geometry, File Geodatabase, and more via GDAL/OGR drivers.
geoqa/
├── core.py # GeoProfile — main entry point
├── geometry.py # Geometry validation & quality checks
├── attributes.py # Attribute profiling & statistics
├── spatial.py # CRS, extent, area/length analysis
├── visualization.py # Folium-based interactive maps
├── report.py # HTML report generation (Jinja2)
├── charts.py # Matplotlib chart generation
├── cli.py # Click-based CLI interface
└── utils.py # Utility functions
Contributions are welcome. See CONTRIBUTING.md for guidelines.
git clone https://github.com/AmmarYasser455/geoqa.git
cd geoqa
pip install -e ".[dev]"
pytest
black geoqa/ tests/Ammar Yasser
- GitHub: @AmmarYasser455
- LinkedIn: Ammar Yasser
GeoQA is inspired by the development methodology and open-source philosophy of Dr. Qiusheng Wu and the opengeos community. Key inspirations include leafmap, geemap, and ydata-profiling.
@software{geoqa2026,
title = {GeoQA: A Python Package for Geospatial Data Quality Assessment},
author = {Ammar Yasser Abdalazim},
year = {2026},
url = {https://github.com/AmmarYasser455/geoqa},
license = {MIT}
}