A Python package for retrieving and processing NOAA high tide flooding (HTF) data at the county level. This package provides tools for both historical data analysis and future projections.
-
Historical Data Processing
- Fetch historical high tide flooding data from NOAA
- Process data by region
- regions define in
config/region_mappings.yaml - each region has its own tide stations defined in
config/{region}_tide_stations.yaml
- regions define in
- Imputation module matches tide stations to reference points along the coast of each county
- Assignment module assigns annual flood days to each county based on imputation module results
- Outputs annual flood days by county in CSV
-
Projected Data Processing
- Fetch projected high tide flooding data from NOAA
- Process projections by region and scenario
- regions define in
config/region_mappings.yaml - each region has its own tide stations defined in
config/{region}_tide_stations.yaml - scenarios: low, intLow, intermediate, intHigh, high (defined in NOAA API response)
- regions define in
- Imputation module matches tide stations to reference points along the coast of each county
- Assignment module assigns projected flood days to each county based on imputation module results
- Outputs projected flood days by county in CSV
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install the package:
# For users
pip install .
# For developers
pip install -e .[dev]- Python 3.8 or higher
- Core dependencies:
- requests >= 2.31.0
- pandas >= 2.0.0
- numpy >= 1.24.0
- pyarrow >= 14.0.1
- pyyaml >= 6.0.0
- scipy >= 1.10.0
- matplotlib >= 3.7.0
Development dependencies (installed with .[dev]):
- pytest >= 7.0.0
- black >= 23.0.0
- flake8 >= 6.0.0
- mypy >= 1.0.0
The package uses YAML configuration files located in the config directory:
noaa_api_settings.yaml: NOAA API settings and data type configurationsregion_mappings.yaml: Region mappings{region}_tide_stations.yaml: Region-specific tide station configurations
Each region has its own configuration file that defines:
- Tide station locations (latitude/longitude)
- Station names and identifiers
- Regional groupings/classifications
- Coverage areas
Supported Regions:
- Alaska (
alaska_tide_stations.yaml) - Hawaii (
hawaii_tide_stations.yaml) - Pacific Islands (
pacific_islands_tide_stations.yaml) - Virgin Islands (
virgin_islands_tide_stations.yaml) - Puerto Rico (
puerto_rico_tide_stations.yaml) - Mid-Atlantic (
mid_atlantic_tide_stations.yaml) - North Atlantic (
north_atlantic_tide_stations.yaml) - South Atlantic (
south_atlantic_tide_stations.yaml) - Gulf Coast (
gulf_coast_tide_stations.yaml) - West Coast (
west_coast_tide_stations.yaml)
Process historical HTF data for a specific region:
python -m src.noaa.historical.historical_htf_cli \
--region gulf_coast \
--start-year 1920 \
--end-year 2024 \
--output-dir output/historical \
--format parquetProcess projected HTF data for a specific region:
python -m src.noaa.projected.projected_htf_cli \
--region hawaii \
--start-decade 2020 \
--end-decade 2100 \
--output-dir output/projected \
--format parquetAnalyze data quality for a specific region or station:
# Analyze a region
python -m src.analysis.generate_report \
--region gulf_coast \
--data-type historical \
--format markdown \
--verbose
The data quality analysis provides:
- Temporal coverage metrics
- Data completeness assessment
- Summary statistics
### Common Arguments
Both tools support the following arguments:
- `--region`: Region to process (required)
- `--output-dir`: Output directory for processed data
- `--format`: Output format (csv/parquet for data, markdown for analysis)
- `--verbose`: Enable verbose logging
## Final Output Format
### Historical Data
The processed historical data includes:
- County
- Year
- Number of flood days
- Region identifier
### Projected Data
The projected data includes:
- County
- Decade
- Scenario (e.g., intermediate, intermediate-high)
- Projected flood days
- Region identifier
## Development
### Project Structure
county_level_tidal_flooding/
├── config/ # Configuration files
│ ├── region_mappings.yaml # Region and county mappings
│ ├── noaa_api_settings.yaml # NOAA API settings
│ └── tide_stations/
│ ├── alaska_tide_stations.yaml
│ ├── hawaii_tide_stations.yaml
│ ├── pacific_islands_tide_stations.yaml
│ ├── virgin_islands_tide_stations.yaml
│ ├── puerto_rico_tide_stations.yaml
│ ├── mid_atlantic_tide_stations.yaml
│ └── north_atlantic_tide_stations.yaml
│ └── south_atlantic_tide_stations.yaml
│ └── gulf_coast_tide_stations.yaml
│ └── west_coast_tide_stations.yaml
├── output/
│ ├── historical/
│ ├── projected/
│ ├── analysis/
│
├── src/
│ └── noaa/
│ ├── core/
│ │ ├── init.py
│ │ ├── noaa_client.py
│ │ ├── rate_limiter.py
│ │ └── cache_manager.py
│ ├── historical/
│ │ ├── init.py
│ │ ├── historical_htf_cli.py
│ │ ├── historical_htf_fetcher.py
│ │ └── historical_htf_processor.py
│ └── projected/
│ ├── init.py
│ ├── projected_htf_cli.py
│ ├── projected_htf_fetcher.py
│ └── projected_htf_processor.py
├── README.md
└── requirements.txt
## Analysis Pipeline
### Historical HTF Analysis
1. Processing historical HTF observations by region
- Data structure: Annual counts of minor flooding events
- Time range: 1920-2024
- Source: NOAA Annual Flood Count Product (minor flooding only)
- **Alaska Exception**: NOAA does not provide HTF data for Alaska stations. Alaska flood days are computed from raw water level observations using 99th percentile thresholds. See [Alaska HTF Methodology](docs/alaska_htf_methodology.md) for details.
2. Generating county-level historical estimates
- Output: Annual minor flooding frequency by county
### Projected HTF Analysis
1. Processing projected HTF data by region
- Data structure: Decadal flooding frequency projections
- Time range: 2020-2100
- Source: NOAA Decadal Projections Product
- Multiple sea level rise scenarios
3. Generating county-level projections
- Output: Projected flooding frequency by county
- Separate outputs for each sea level rise scenario
The separation into historical and projected analyses is necessary due to:
- Different data structures and temporal resolutions
- Distinct quality control requirements
### Preprocessing: Coastal Reference Points
Before running imputation, coastal reference points must be generated from shoreline data. The system uses two shoreline datasets:
1. **NOS80K** (NOAA Medium Resolution Shoreline) - for continental US
2. **GSHHG** (Global Self-consistent, Hierarchical, High-resolution Geography) - for non-CONUS regions (Alaska, Hawaii, Puerto Rico, Virgin Islands, Pacific Islands)
Reference points are generated at 5km intervals along the shoreline within each county boundary and stored in:
output/county_shoreline_ref_points/coastal_reference_points.parquet
See [Shoreline Data Documentation](docs/shoreline_data.md) for details on data sources and regeneration.
### Imputation Pipeline
1. Create station to county mapping
- Load reference points along the coast of each county
- Match stations to the nearest reference points using KD-tree spatial search
- Calculate inverse distance weights (IDW) with 100km max distance, power=2
- Output: station to county mapping (parquet files per region)
### Assignment Pipeline
1. Assign flood days to each county based on the station to county mapping
- aggregate flood days by county and year
- Output: county-level flood days in csv
### Imputation Coverage Visualization
The package includes visualization tools for analyzing tide gauge coverage across different coastal regions:
- **Coverage Metrics**:
- Combines number of tide stations (n) and their mean weights (w̄) as CS = n × w̄
- Weights decrease with distance from each station
- Higher scores indicate better coverage
- **Regional Visualizations**:
- Mid Atlantic (`imputation_map_mid_atlantic.py`)
- North Atlantic (`imputation_map_north_atlantic.py`)
- South Atlantic (`imputation_map_south_atlantic.py`)
- Gulf Coast (`imputation_map_gulf_coast.py`)
- West Coast (`imputation_map_west_coast.py`)
- Puerto Rico (`imputation_map_puerto_rico.py`)
- Virgin Islands (`imputation_map_virgin_islands.py`)
- Hawaii (`imputation_map_hawaii.py`)
Note: Alaska and Pacific Islands have verification scripts (`imputation_verify_*.py`) but not full map visualizations.
Each visualization includes:
- Choropleth maps showing coverage scores
- Tide station locations and names
- Coverage statistics and metrics
- Region-specific projections and styling
Output maps are saved to `output/maps/imputation/` directory.