County-Level Tidal Flooding Data Processing

A Python package for retrieving and processing NOAA high tide flooding (HTF) data at the county level. This package provides tools for both historical data analysis and future projections.

Features

Historical Data Processing
- Fetch historical high tide flooding data from NOAA
- Process data by region
  - regions define in config/region_mappings.yaml
  - each region has its own tide stations defined in config/{region}_tide_stations.yaml
- Imputation module matches tide stations to reference points along the coast of each county
- Assignment module assigns annual flood days to each county based on imputation module results
- Outputs annual flood days by county in CSV
Projected Data Processing
- Fetch projected high tide flooding data from NOAA
- Process projections by region and scenario
  - regions define in config/region_mappings.yaml
  - each region has its own tide stations defined in config/{region}_tide_stations.yaml
  - scenarios: low, intLow, intermediate, intHigh, high (defined in NOAA API response)
- Imputation module matches tide stations to reference points along the coast of each county
- Assignment module assigns projected flood days to each county based on imputation module results
- Outputs projected flood days by county in CSV

Installation

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the package:

# For users
pip install .

# For developers
pip install -e .[dev]

Requirements

Python 3.8 or higher
Core dependencies:
- requests >= 2.31.0
- pandas >= 2.0.0
- numpy >= 1.24.0
- pyarrow >= 14.0.1
- pyyaml >= 6.0.0
- scipy >= 1.10.0
- matplotlib >= 3.7.0

Development dependencies (installed with .[dev]):

pytest >= 7.0.0
black >= 23.0.0
flake8 >= 6.0.0
mypy >= 1.0.0

Configuration

The package uses YAML configuration files located in the config directory:

noaa_api_settings.yaml: NOAA API settings and data type configurations
region_mappings.yaml: Region mappings
{region}_tide_stations.yaml: Region-specific tide station configurations

Regional Configurations

Each region has its own configuration file that defines:

Tide station locations (latitude/longitude)
Station names and identifiers
Regional groupings/classifications
Coverage areas

Supported Regions:

Alaska (alaska_tide_stations.yaml)
Hawaii (hawaii_tide_stations.yaml)
Pacific Islands (pacific_islands_tide_stations.yaml)
Virgin Islands (virgin_islands_tide_stations.yaml)
Puerto Rico (puerto_rico_tide_stations.yaml)
Mid-Atlantic (mid_atlantic_tide_stations.yaml)
North Atlantic (north_atlantic_tide_stations.yaml)
South Atlantic (south_atlantic_tide_stations.yaml)
Gulf Coast (gulf_coast_tide_stations.yaml)
West Coast (west_coast_tide_stations.yaml)

Usage

Historical Data Processing

Process historical HTF data for a specific region:

python -m src.noaa.historical.historical_htf_cli \
    --region gulf_coast \
    --start-year 1920 \
    --end-year 2024 \
    --output-dir output/historical \
    --format parquet

Projected Data Processing

Process projected HTF data for a specific region:

python -m src.noaa.projected.projected_htf_cli \
    --region hawaii \
    --start-decade 2020 \
    --end-decade 2100 \
    --output-dir output/projected \
    --format parquet

Data Quality Analysis

Analyze data quality for a specific region or station:

# Analyze a region
python -m src.analysis.generate_report \
    --region gulf_coast \
    --data-type historical \
    --format markdown \
    --verbose

The data quality analysis provides:
- Temporal coverage metrics
- Data completeness assessment
- Summary statistics

### Common Arguments

Both tools support the following arguments:

- `--region`: Region to process (required)
- `--output-dir`: Output directory for processed data
- `--format`: Output format (csv/parquet for data, markdown for analysis)
- `--verbose`: Enable verbose logging

## Final Output Format

### Historical Data

The processed historical data includes:
- County
- Year
- Number of flood days
- Region identifier

### Projected Data

The projected data includes:
- County
- Decade
- Scenario (e.g., intermediate, intermediate-high)
- Projected flood days
- Region identifier

## Development

### Project Structure

county_level_tidal_flooding/ ├── config/ # Configuration files │ ├── region_mappings.yaml # Region and county mappings │ ├── noaa_api_settings.yaml # NOAA API settings │ └── tide_stations/ │ ├── alaska_tide_stations.yaml │ ├── hawaii_tide_stations.yaml │ ├── pacific_islands_tide_stations.yaml │ ├── virgin_islands_tide_stations.yaml │ ├── puerto_rico_tide_stations.yaml │ ├── mid_atlantic_tide_stations.yaml │ └── north_atlantic_tide_stations.yaml │ └── south_atlantic_tide_stations.yaml │ └── gulf_coast_tide_stations.yaml │ └── west_coast_tide_stations.yaml ├── output/ │ ├── historical/ │ ├── projected/ │ ├── analysis/ │
├── src/ │ └── noaa/ │ ├── core/ │ │ ├── init.py │ │ ├── noaa_client.py │ │ ├── rate_limiter.py │ │ └── cache_manager.py │ ├── historical/ │ │ ├── init.py │ │ ├── historical_htf_cli.py │ │ ├── historical_htf_fetcher.py │ │ └── historical_htf_processor.py │ └── projected/ │ ├── init.py │ ├── projected_htf_cli.py │ ├── projected_htf_fetcher.py │ └── projected_htf_processor.py ├── README.md └── requirements.txt


## Analysis Pipeline

### Historical HTF Analysis
1. Processing historical HTF observations by region
   - Data structure: Annual counts of minor flooding events
   - Time range: 1920-2024
   - Source: NOAA Annual Flood Count Product (minor flooding only)
   - **Alaska Exception**: NOAA does not provide HTF data for Alaska stations. Alaska flood days are computed from raw water level observations using 99th percentile thresholds. See [Alaska HTF Methodology](docs/alaska_htf_methodology.md) for details.
2. Generating county-level historical estimates
   - Output: Annual minor flooding frequency by county

### Projected HTF Analysis
1. Processing projected HTF data by region
   - Data structure: Decadal flooding frequency projections
   - Time range: 2020-2100
   - Source: NOAA Decadal Projections Product
   - Multiple sea level rise scenarios
3. Generating county-level projections
   - Output: Projected flooding frequency by county
   - Separate outputs for each sea level rise scenario

The separation into historical and projected analyses is necessary due to:
- Different data structures and temporal resolutions
- Distinct quality control requirements

### Preprocessing: Coastal Reference Points

Before running imputation, coastal reference points must be generated from shoreline data. The system uses two shoreline datasets:

1. **NOS80K** (NOAA Medium Resolution Shoreline) - for continental US
2. **GSHHG** (Global Self-consistent, Hierarchical, High-resolution Geography) - for non-CONUS regions (Alaska, Hawaii, Puerto Rico, Virgin Islands, Pacific Islands)

Reference points are generated at 5km intervals along the shoreline within each county boundary and stored in:

output/county_shoreline_ref_points/coastal_reference_points.parquet


See [Shoreline Data Documentation](docs/shoreline_data.md) for details on data sources and regeneration.

### Imputation Pipeline

1. Create station to county mapping
   - Load reference points along the coast of each county
   - Match stations to the nearest reference points using KD-tree spatial search
   - Calculate inverse distance weights (IDW) with 100km max distance, power=2
   - Output: station to county mapping (parquet files per region)

### Assignment Pipeline

1. Assign flood days to each county based on the station to county mapping
   - aggregate flood days by county and year
   - Output: county-level flood days in csv

### Imputation Coverage Visualization

The package includes visualization tools for analyzing tide gauge coverage across different coastal regions:

- **Coverage Metrics**: 
  - Combines number of tide stations (n) and their mean weights (w̄) as CS = n × w̄
  - Weights decrease with distance from each station
  - Higher scores indicate better coverage

- **Regional Visualizations**:
  - Mid Atlantic (`imputation_map_mid_atlantic.py`)
  - North Atlantic (`imputation_map_north_atlantic.py`)
  - South Atlantic (`imputation_map_south_atlantic.py`)
  - Gulf Coast (`imputation_map_gulf_coast.py`)
  - West Coast (`imputation_map_west_coast.py`)
  - Puerto Rico (`imputation_map_puerto_rico.py`)
  - Virgin Islands (`imputation_map_virgin_islands.py`)
  - Hawaii (`imputation_map_hawaii.py`)

  Note: Alaska and Pacific Islands have verification scripts (`imputation_verify_*.py`) but not full map visualizations.

Each visualization includes:
- Choropleth maps showing coverage scores
- Tide station locations and names
- Coverage statistics and metrics
- Region-specific projections and styling

Output maps are saved to `output/maps/imputation/` directory.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.claude		.claude
archive/visualizations		archive/visualizations
config		config
docs		docs
output		output
src		src
tests/noaa		tests/noaa
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

County-Level Tidal Flooding Data Processing

Features

Installation

Requirements

Configuration

Regional Configurations

Usage

Historical Data Processing

Projected Data Processing

Data Quality Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

mihiarc/rpa-slr

Folders and files

Latest commit

History

Repository files navigation

County-Level Tidal Flooding Data Processing

Features

Installation

Requirements

Configuration

Regional Configurations

Usage

Historical Data Processing

Projected Data Processing

Data Quality Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages