Skip to content

thomasAmorrow/ExplorationGapAnalysis

Repository files navigation

vlogo

Version DOI dataset DOI License: CC0-1.0

Docker Airflow PostGIS

Overview

The Exploration Gap Analysis (EGA) is a tool to establish a spatial coverage baseline for ocean exploration data holdings, support the monitoring of exploration and characterization progress on previously unexplored ocean areas, and aid in the identification of priority areas for future expeditions and data collection efforts. At its core, the EGA is a PostGIS database synthesizing deep sea scientific observations from publicly available data archives.

The current version is still in development.

📑 Table of Contents

Methods and Tools

The EGA leverages containerized workflows, orchestrated using Apache Airflow, with processing steps written in Python and SQL. All tasks are managed via lightweight Airflow workers.

Spatial data are indexed using the H3 hexagonal grid system at resolution 5. Each hexagon receives an Exploration Score based on the presence or absence of observation types deeper than 200 m:

  • Score 1: Observation type is present
  • Score 0: Observation type is absent

Averaging these per observation type yields a composite score. Scores at coarser H3 resolutions (4 and 3) are computed by summing the scores of all child hexagons and dividing by the number of child hexagons (averaging).


Inputs and Outputs

Inputs

Observation Type Data Source
Biological Occurrence Observations GBIF
Geological Seafloor/Sub-seafloor Samples NCEI Marine & Lacustrine Samples
Environmental DNA (eDNA) Sequences OBIS
Water Biogeochemical Samples GLODAP
Seafloor Bathymetry Coverage (ID grid) GEBCO
Water Column Sonar Data NCEI Water Column Sonar

More types are in development and will be added in future releases.

Outputs

Output files from the full processing pipeline at hexagon resolution 5 (~11 km width). Finer resolutions (6+) can be created but are typically unweildy to analyze or visualize. Three file formats and the complete database are provided at the corresponding Zenodo Dataset:

  • Hexagon GeoJSON: Full-resolution hex polygons with scores per observation type
  • Point GeoJSON: Lighter-weight centroid points file for each hexagon with the same properties
  • csv: Simple csv flat file with no geospatial data, only hexagon indices
  • sql: SQL dump of the entire database after assembly, cleaning, and processing

Zenodo Dataset: DOI 10.5281/zenodo.15490755

View PostGIS Database Entity Relationship Diagram


Installation and Usage

EGA is deployed using Docker Compose, currently on Ubuntu AWS EC2, though it's compatible with any Docker-ready environment.

  • All required Dockerfiles and a docker-compose.yml are included
  • Some paths point to an S3 bucket but can be adapted to local filesystems
  • Airflow requires manual setup—see Airflow Docs
  • A sample .env file is included for environment configuration

Visuals

vlogo

GeoJSON results files can be visualized using a number of different tools. Here in ArcGIS Pro we show the nested heirarchy structure of level 5 resolution hexagons (upper) and level 4 resolution hexagons (lower). Composite scoring in the coarser hexagons depends on their contents and completely unexplored hexagons are highlighted as the highest priority for future exploration work.

📌 If you generate a compelling visualization or want to share use cases, submit a PR or email us to feature it here!


Contributions

Interested in contributing?
Have data? Ideas? Feedback? Help us improve our understanding of the unknown deep ocean.

📬 Open an issue/PR to get involved.


Future Development

The dev branch is the most active—follow for updates. Upcoming milestones include:

  1. DOI/citation extraction from contributing datasets
  2. Leaflet-based web map viewer
  3. ArcGIS-ready exports (Experience Builder, Online)
  4. SME-driven enhancements to scoring methods

Release Notes

0.2.0-beta

  • Functioning ingest from six public archives
  • Generates outputs in GeoJSON, CSV formats
  • Hosted on corresponding Zenodo Dataset

About

Exploration Gap Analysis - a tool for quantifying and prioritizing global ocean exploration science

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •