Skip to content

mihiarc/rpa-slr

Repository files navigation

County-Level Tidal Flooding Data Processing

A Python package for retrieving and processing NOAA high tide flooding (HTF) data at the county level. This package provides tools for both historical data analysis and future projections.

Features

  • Historical Data Processing

    • Fetch historical high tide flooding data from NOAA
    • Process data by region
      • regions define in config/region_mappings.yaml
      • each region has its own tide stations defined in config/{region}_tide_stations.yaml
    • Imputation module matches tide stations to reference points along the coast of each county
    • Assignment module assigns annual flood days to each county based on imputation module results
    • Outputs annual flood days by county in CSV
  • Projected Data Processing

    • Fetch projected high tide flooding data from NOAA
    • Process projections by region and scenario
      • regions define in config/region_mappings.yaml
      • each region has its own tide stations defined in config/{region}_tide_stations.yaml
      • scenarios: low, intLow, intermediate, intHigh, high (defined in NOAA API response)
    • Imputation module matches tide stations to reference points along the coast of each county
    • Assignment module assigns projected flood days to each county based on imputation module results
    • Outputs projected flood days by county in CSV

Installation

  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install the package:
# For users
pip install .

# For developers
pip install -e .[dev]

Requirements

  • Python 3.8 or higher
  • Core dependencies:
    • requests >= 2.31.0
    • pandas >= 2.0.0
    • numpy >= 1.24.0
    • pyarrow >= 14.0.1
    • pyyaml >= 6.0.0
    • scipy >= 1.10.0
    • matplotlib >= 3.7.0

Development dependencies (installed with .[dev]):

  • pytest >= 7.0.0
  • black >= 23.0.0
  • flake8 >= 6.0.0
  • mypy >= 1.0.0

Configuration

The package uses YAML configuration files located in the config directory:

  • noaa_api_settings.yaml: NOAA API settings and data type configurations
  • region_mappings.yaml: Region mappings
  • {region}_tide_stations.yaml: Region-specific tide station configurations

Regional Configurations

Each region has its own configuration file that defines:

  • Tide station locations (latitude/longitude)
  • Station names and identifiers
  • Regional groupings/classifications
  • Coverage areas

Supported Regions:

  • Alaska (alaska_tide_stations.yaml)
  • Hawaii (hawaii_tide_stations.yaml)
  • Pacific Islands (pacific_islands_tide_stations.yaml)
  • Virgin Islands (virgin_islands_tide_stations.yaml)
  • Puerto Rico (puerto_rico_tide_stations.yaml)
  • Mid-Atlantic (mid_atlantic_tide_stations.yaml)
  • North Atlantic (north_atlantic_tide_stations.yaml)
  • South Atlantic (south_atlantic_tide_stations.yaml)
  • Gulf Coast (gulf_coast_tide_stations.yaml)
  • West Coast (west_coast_tide_stations.yaml)

Usage

Historical Data Processing

Process historical HTF data for a specific region:

python -m src.noaa.historical.historical_htf_cli \
    --region gulf_coast \
    --start-year 1920 \
    --end-year 2024 \
    --output-dir output/historical \
    --format parquet

Projected Data Processing

Process projected HTF data for a specific region:

python -m src.noaa.projected.projected_htf_cli \
    --region hawaii \
    --start-decade 2020 \
    --end-decade 2100 \
    --output-dir output/projected \
    --format parquet

Data Quality Analysis

Analyze data quality for a specific region or station:

# Analyze a region
python -m src.analysis.generate_report \
    --region gulf_coast \
    --data-type historical \
    --format markdown \
    --verbose

The data quality analysis provides:
- Temporal coverage metrics
- Data completeness assessment
- Summary statistics

### Common Arguments

Both tools support the following arguments:

- `--region`: Region to process (required)
- `--output-dir`: Output directory for processed data
- `--format`: Output format (csv/parquet for data, markdown for analysis)
- `--verbose`: Enable verbose logging

## Final Output Format

### Historical Data

The processed historical data includes:
- County
- Year
- Number of flood days
- Region identifier

### Projected Data

The projected data includes:
- County
- Decade
- Scenario (e.g., intermediate, intermediate-high)
- Projected flood days
- Region identifier

## Development

### Project Structure

county_level_tidal_flooding/ ├── config/ # Configuration files │ ├── region_mappings.yaml # Region and county mappings │ ├── noaa_api_settings.yaml # NOAA API settings │ └── tide_stations/ │ ├── alaska_tide_stations.yaml │ ├── hawaii_tide_stations.yaml │ ├── pacific_islands_tide_stations.yaml │ ├── virgin_islands_tide_stations.yaml │ ├── puerto_rico_tide_stations.yaml │ ├── mid_atlantic_tide_stations.yaml │ └── north_atlantic_tide_stations.yaml │ └── south_atlantic_tide_stations.yaml │ └── gulf_coast_tide_stations.yaml │ └── west_coast_tide_stations.yaml ├── output/ │ ├── historical/ │ ├── projected/ │ ├── analysis/ │
├── src/ │ └── noaa/ │ ├── core/ │ │ ├── init.py │ │ ├── noaa_client.py │ │ ├── rate_limiter.py │ │ └── cache_manager.py │ ├── historical/ │ │ ├── init.py │ │ ├── historical_htf_cli.py │ │ ├── historical_htf_fetcher.py │ │ └── historical_htf_processor.py │ └── projected/ │ ├── init.py │ ├── projected_htf_cli.py │ ├── projected_htf_fetcher.py │ └── projected_htf_processor.py ├── README.md └── requirements.txt


## Analysis Pipeline

### Historical HTF Analysis
1. Processing historical HTF observations by region
   - Data structure: Annual counts of minor flooding events
   - Time range: 1920-2024
   - Source: NOAA Annual Flood Count Product (minor flooding only)
   - **Alaska Exception**: NOAA does not provide HTF data for Alaska stations. Alaska flood days are computed from raw water level observations using 99th percentile thresholds. See [Alaska HTF Methodology](docs/alaska_htf_methodology.md) for details.
2. Generating county-level historical estimates
   - Output: Annual minor flooding frequency by county

### Projected HTF Analysis
1. Processing projected HTF data by region
   - Data structure: Decadal flooding frequency projections
   - Time range: 2020-2100
   - Source: NOAA Decadal Projections Product
   - Multiple sea level rise scenarios
3. Generating county-level projections
   - Output: Projected flooding frequency by county
   - Separate outputs for each sea level rise scenario

The separation into historical and projected analyses is necessary due to:
- Different data structures and temporal resolutions
- Distinct quality control requirements

### Preprocessing: Coastal Reference Points

Before running imputation, coastal reference points must be generated from shoreline data. The system uses two shoreline datasets:

1. **NOS80K** (NOAA Medium Resolution Shoreline) - for continental US
2. **GSHHG** (Global Self-consistent, Hierarchical, High-resolution Geography) - for non-CONUS regions (Alaska, Hawaii, Puerto Rico, Virgin Islands, Pacific Islands)

Reference points are generated at 5km intervals along the shoreline within each county boundary and stored in:

output/county_shoreline_ref_points/coastal_reference_points.parquet


See [Shoreline Data Documentation](docs/shoreline_data.md) for details on data sources and regeneration.

### Imputation Pipeline

1. Create station to county mapping
   - Load reference points along the coast of each county
   - Match stations to the nearest reference points using KD-tree spatial search
   - Calculate inverse distance weights (IDW) with 100km max distance, power=2
   - Output: station to county mapping (parquet files per region)

### Assignment Pipeline

1. Assign flood days to each county based on the station to county mapping
   - aggregate flood days by county and year
   - Output: county-level flood days in csv

### Imputation Coverage Visualization

The package includes visualization tools for analyzing tide gauge coverage across different coastal regions:

- **Coverage Metrics**: 
  - Combines number of tide stations (n) and their mean weights (w̄) as CS = n × w̄
  - Weights decrease with distance from each station
  - Higher scores indicate better coverage

- **Regional Visualizations**:
  - Mid Atlantic (`imputation_map_mid_atlantic.py`)
  - North Atlantic (`imputation_map_north_atlantic.py`)
  - South Atlantic (`imputation_map_south_atlantic.py`)
  - Gulf Coast (`imputation_map_gulf_coast.py`)
  - West Coast (`imputation_map_west_coast.py`)
  - Puerto Rico (`imputation_map_puerto_rico.py`)
  - Virgin Islands (`imputation_map_virgin_islands.py`)
  - Hawaii (`imputation_map_hawaii.py`)

  Note: Alaska and Pacific Islands have verification scripts (`imputation_verify_*.py`) but not full map visualizations.

Each visualization includes:
- Choropleth maps showing coverage scores
- Tide station locations and names
- Coverage statistics and metrics
- Region-specific projections and styling

Output maps are saved to `output/maps/imputation/` directory.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages