A robust Python pipeline for processing climate raster data and calculating climate indices using the xclim library. This pipeline efficiently handles large climate datasets from external drives, supporting both GeoTIFF and NetCDF formats.
- Multi-format Support: Load climate data from GeoTIFF and NetCDF files
- Parallel Processing: Leverages Dask for efficient processing of large datasets
- Comprehensive Indices: Calculate 80 climate indices including:
- Temperature indices (frost days, tropical nights, growing degree days, temperature variability)
- Precipitation indices (consecutive dry/wet days, extreme precipitation)
- Agricultural indices (growing season length, PET, corn heat units)
- Drought indices (SPI at 5 time windows, comprehensive dry spell metrics)
- Extreme event indices (heat waves, cold spells, spell frequency)
- Data Quality Control: Automatic outlier detection and missing value handling
- CF Compliance: Outputs follow Climate and Forecast (CF) conventions
- Flexible Configuration: YAML-based configuration for easy customization
- Clone the repository:
git clone https://github.com/yourusername/xclim-timber.git
cd xclim-timber- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtGenerate baseline percentiles for extreme indices (required for temperature, precipitation, and multivariate pipelines):
python calculate_baseline_percentiles.pyThis is a one-time operation (~20-30 minutes) that calculates day-of-year percentiles from 1981-2000 baseline period for temperature extremes, precipitation extremes, and multivariate thresholds. The results are cached as data/baselines/baseline_percentiles_1981_2000.nc (10.7GB) for all future runs.
- Run temperature pipeline (35 indices - Phase 9):
python temperature_pipeline.py- Run precipitation pipeline (13 indices - Phase 6):
python precipitation_pipeline.py- Run humidity pipeline (8 indices):
python humidity_pipeline.py- Run human comfort pipeline (3 indices):
python human_comfort_pipeline.py- Run multivariate pipeline (4 indices):
python multivariate_pipeline.py- Run agricultural pipeline (5 indices - Phase 8):
python agricultural_pipeline.py- Run drought pipeline (12 indices - Phase 10 Final):
python drought_pipeline.pyAll pipelines default to processing 1981-2024 data period. Use --start-year and --end-year to customize:
python temperature_pipeline.py --start-year 2000 --end-year 2020Edit the configuration file to customize:
- Data paths: Location of input data on external drive
- File patterns: Patterns to identify climate variable files
- Processing options: Chunk sizes, Dask workers, resampling frequency
- Climate indices: Select which indices to calculate
- Output format: NetCDF or GeoTIFF
Example configuration snippet:
data:
input_path: /media/external_drive/climate_data
output_path: ./outputs
file_patterns:
temperature: ['*tas*.tif', '*temp*.nc']
precipitation: ['*pr*.tif', '*precip*.nc']
processing:
chunk_size:
time: 365
lat: 100
lon: 100
dask:
n_workers: 4
memory_limit: 4GB
indices:
temperature:
- tg_mean
- frost_days
- growing_degree_days
precipitation:
- prcptot
- rx1day
- cddfrom src.pipeline import ClimateDataPipeline
# Initialize pipeline
pipeline = ClimateDataPipeline('config.yaml')
# Run complete pipeline
pipeline.run()from src.config import Config
from src.data_loader import ClimateDataLoader
config = Config('config.yaml')
loader = ClimateDataLoader(config)
# Load temperature data
temp_data = loader.load_variable_data('temperature')
print(f"Loaded data shape: {dict(temp_data.dims)}")from src.indices_calculator import ClimateIndicesCalculator
calculator = ClimateIndicesCalculator(config)
# Calculate temperature indices
temp_indices = calculator.calculate_temperature_indices(temp_dataset)
# Save results
calculator.save_indices('outputs/indices.nc')This pipeline currently implements 80 validated climate indices (35 temperature + 13 precipitation + 8 humidity + 3 human comfort + 4 multivariate + 5 agricultural + 12 drought) achieving 100% of the 80-index goal. All indices follow World Meteorological Organization (WMO) standards and CF (Climate and Forecast) conventions using the xclim library.
The pipeline processes these core climate variables:
Temperature Data:
tas: Near-surface air temperature (daily mean)tasmax: Daily maximum near-surface air temperaturetasmin: Daily minimum near-surface air temperature
Precipitation Data:
pr: Daily precipitation amount
Humidity Data:
hus: Specific humidity (kg/kg)hurs: Relative humidity (%)
Variable Name Flexibility: The system supports multiple naming conventions:
- Temperature: 'tas', 'temperature', 'temp', 'tmean', 'tasmax', 'tmax', 'tasmin', 'tmin'
- Precipitation: 'pr', 'precipitation', 'precip', 'prcp'
- Humidity: 'hus', 'huss', 'specific_humidity', 'hurs', 'relative_humidity', 'rh'
Basic Statistics (3):
tg_mean: Annual mean temperaturetx_max: Annual maximum temperaturetn_min: Annual minimum temperature
Temperature Range Metrics (2):
daily_temperature_range: Mean daily temperature range (tmax - tmin)extreme_temperature_range: Annual max(tmax) - min(tmin) (annual extremes span)
Threshold-Based Counts (6):
tropical_nights: Number of nights with minimum temperature > 20°Cfrost_days: Number of days with minimum temperature < 0°Cice_days: Number of days with maximum temperature < 0°Csummer_days: Number of days with maximum temperature > 25°Chot_days: Number of days with maximum temperature > 30°Cconsecutive_frost_days: Maximum consecutive frost days
Frost Season Indices (4):
frost_season_length: Duration from first to last frost (agricultural planning)frost_free_season_start: Julian day of last spring frost (planting date)frost_free_season_end: Julian day of first fall frost (harvest planning)frost_free_season_length: Days between last spring and first fall frost
Degree Day Metrics (4):
growing_degree_days: Accumulated temperature above 10°C threshold (crop development)heating_degree_days: Accumulated temperature below 17°C threshold (energy demand)cooling_degree_days: Accumulated temperature above 18°C threshold (cooling energy demand)freezing_degree_days: Accumulated temperature below 0°C (winter severity)
Extreme Percentile-Based Indices (6) - Uses 1981-2000 Baseline:
tx90p: Warm days (daily maximum temperature > 90th percentile)tn90p: Warm nights (daily minimum temperature > 90th percentile)tx10p: Cool days (daily maximum temperature < 10th percentile)tn10p: Cool nights (daily minimum temperature < 10th percentile)warm_spell_duration_index: Warm spell duration (≥6 consecutive warm days)cold_spell_duration_index: Cold spell duration (≥6 consecutive cold days)
Advanced Temperature Extremes (8) - Phase 7:
growing_season_start: First day when temperature exceeds 5°C for 5+ consecutive days (ETCCDI standard)growing_season_end: First day after July 1st when temperature drops below 5°C for 5+ consecutive dayscold_spell_frequency: Number of discrete cold spell events (temperature < -10°C for 5+ days)hot_spell_frequency: Number of hot spell events (tasmax > 30°C for 3+ days)heat_wave_frequency: Number of heat wave events (tasmin > 22°C AND tasmax > 30°C for 3+ days)freezethaw_spell_frequency: Number of freeze-thaw cycles (tasmax > 0°C AND tasmin ≤ 0°C on same day)last_spring_frost: Last day in spring when tasmin < 0°C (critical for agriculture)daily_temperature_range_variability: Average day-to-day variation in daily temperature range (climate stability)
Temperature Variability (2) - Phase 9:
temperature_seasonality: Annual temperature coefficient of variation (standard deviation as percentage of mean) - ANUCLIM BIO4 variableheat_wave_index: Total days that are part of a heat wave (5+ consecutive days with tasmax > 25°C)
Basic Statistics (4):
prcptot: Total annual precipitation (wet days ≥ 1mm)rx1day: Maximum 1-day precipitation amountrx5day: Maximum 5-day precipitation amountsdii: Simple daily intensity index (average precipitation on wet days)
Consecutive Events (2):
cdd: Maximum consecutive dry days (< 1mm)cwd: Maximum consecutive wet days (≥ 1mm)
Extreme Percentile-Based Indices (2) - Uses 1981-2000 Baseline:
r95p: Very wet days (precipitation > 95th percentile of wet days)r99p: Extremely wet days (precipitation > 99th percentile of wet days)
Fixed Threshold Indices (2):
r10mm: Heavy precipitation days (≥ 10mm)r20mm: Very heavy precipitation days (≥ 20mm)
Enhanced Precipitation Analysis (3) - Phase 6:
dry_days: Total number of dry days (< 1mm)wetdays: Total number of wet days (≥ 1mm)wetdays_prop: Proportion of days that are wet
Dewpoint Statistics (4):
dewpoint_mean: Annual mean dewpoint temperaturedewpoint_min: Annual minimum dewpoint temperaturedewpoint_max: Annual maximum dewpoint temperaturehumid_days: Days with dewpoint > 18°C (uncomfortable humidity)
Vapor Pressure Deficit (4):
vpdmax_mean: Annual mean maximum VPDextreme_vpd_days: Days with VPD > 4 kPa (plant water stress)vpdmin_mean: Annual mean minimum VPDlow_vpd_days: Days with VPD < 0.5 kPa (high moisture/fog potential)
Heat Stress Assessment:
heat_index: Heat index combining temperature and humidity effects (apparent temperature)humidex: Canadian humidex index for apparent temperature
Humidity Validation:
relative_humidity: Relative humidity calculated from dewpoint temperature (QC metric)
Compound Climate Extremes - Uses 1981-2000 Baseline:
cold_and_dry_days: Days with temperature below 25th percentile AND precipitation below 25th percentile (compound drought conditions)cold_and_wet_days: Days with temperature below 25th percentile AND precipitation above 75th percentile (flooding risk, winter storms)warm_and_dry_days: Days with temperature above 75th percentile AND precipitation below 25th percentile (drought/fire weather)warm_and_wet_days: Days with temperature above 75th percentile AND precipitation above 75th percentile (compound extreme events)
Scientific Context: These multivariate indices capture compound climate extremes that result from the interaction of multiple climate variables. They are increasingly important for climate change impact assessment, as compound events often have disproportionate impacts compared to single-variable extremes.
Growing Season Analysis (1):
growing_season_length: Total days between first and last occurrence of 6+ consecutive days with temperature above 5°C (ETCCDI standard)
Water Balance (1):
potential_evapotranspiration: Annual potential evapotranspiration using Baier-Robertson 1965 method (temperature-only, suitable for regions without wind/radiation data)
Crop-Specific Indices (1):
corn_heat_units: Annual accumulated corn heat units for crop development and maturity prediction (USDA standard, widely used in North American agriculture)
Spring Thaw Monitoring (1):
thawing_degree_days: Sum of degree-days above 0°C (permafrost monitoring, spring melt timing, critical for northern latitudes)
Growing Season Water Availability (1):
growing_season_precipitation: Total precipitation during growing season (April-October, northern hemisphere)
Agricultural Value: These indices support agricultural decision-making including crop variety selection, planting timing, irrigation scheduling, and harvest planning. They are particularly valuable for adapting to climate change impacts on agriculture.
Standardized Precipitation Index (5 windows):
spi_1month: 1-month SPI for short-term agricultural drought monitoringspi_3month: 3-month SPI for seasonal agricultural drought (most common)spi_6month: 6-month SPI for medium-term agricultural/hydrological droughtspi_12month: 12-month SPI for long-term hydrological drought monitoringspi_24month: 24-month SPI for multi-year persistent drought conditions
Dry Spell Analysis (4 indices):
cdd: Maximum consecutive dry days (< 1mm precipitation) - ETCCDI standarddry_spell_frequency: Number of distinct dry spell events (≥3 consecutive days with < 1mm precipitation)dry_spell_total_length: Total days in all dry spells per year (cumulative dry spell duration)dry_days: Total number of dry days per year (< 1mm precipitation threshold)
Precipitation Intensity (3 indices):
sdii: Simple daily intensity index - average precipitation on wet days (ETCCDI standard)max_7day_pr_intensity: Maximum precipitation over any 7-day rolling period (flood risk assessment)fraction_heavy_precip: Fraction of annual precipitation from heavy events (> 75th percentile)
Drought Monitoring Value: SPI is the gold standard for drought monitoring following McKee et al. (1993) methodology. Multiple time windows enable detection of agricultural (1-6 months), hydrological (6-12 months), and long-term persistent (12-24 months) drought conditions. All SPI calculations use 30-year calibration period (1981-2010) with gamma distribution fitting. Dry spell metrics provide comprehensive drought event characterization including frequency, duration, and intensity.
All 80 planned climate indices have been successfully implemented across 7 comprehensive pipelines:
- ✅ 35 Temperature Indices (Phases 1-3, 7, 9)
- ✅ 13 Precipitation Indices (Phase 6)
- ✅ 8 Humidity Indices (Phase 2)
- ✅ 3 Human Comfort Indices (Phase 4)
- ✅ 4 Multivariate Indices (Phase 5)
- ✅ 5 Agricultural Indices (Phase 8)
- ✅ 12 Drought Indices (Phase 10)
Implementation Note: Three drought indices (dry_spell_frequency, dry_spell_total_length, max_7day_pr_intensity) were implemented using manual calculations to work around xclim unit compatibility issues, ensuring full coverage without compromising scientific accuracy.
Processing Architecture:
- All indices calculated using the scientifically-validated xclim library
- Annual frequency: Most indices use annual calculations (
freq='YS') - Robust error handling: Each index calculation includes comprehensive error handling
- CF-compliant metadata: All outputs follow Climate and Forecast conventions
xclim-timber/
├── src/ # Core pipeline modules
│ ├── config.py # Configuration management
│ ├── data_loader.py # Data loading from various formats
│ ├── preprocessor.py # Data cleaning and standardization
│ ├── indices_calculator.py # Climate indices calculation
│ └── pipeline.py # Main orchestration
├── scripts/ # Processing and analysis scripts
│ ├── csv_formatter.py # CSV format converter (long ↔ wide)
│ ├── efficient_extraction.py # Optimized point extraction
│ ├── fast_point_extraction.py # Alternative extraction method
│ └── visualize_temp.py # Results visualization
├── data/ # Data files
│ ├── test_data/ # Test datasets and coordinates
│ └── sample_data/ # Sample data for development
├── outputs/ # Processed results
├── logs/ # Processing logs
├── docs/ # Documentation
├── benchmarks/ # Performance benchmarks
└── requirements.txt # Dependencies
- Chunking: Data is automatically chunked for efficient memory usage
- Parallel Processing: Dask enables parallel computation across multiple cores
- Lazy Evaluation: Operations are queued and executed efficiently
- Memory Management: Large datasets are processed without loading entirely into memory
# Show help
python src/pipeline.py --help
# Run with verbose logging
python src/pipeline.py -c config.yaml --verbose
# Specify output directory
python src/pipeline.py -c config.yaml -o /path/to/output
# Process specific variables
python src/pipeline.py -c config.yaml -v temperature -v precipitation
4. **Format CSV outputs** (optional):
```bash
# Convert to both long and wide formats
python scripts/csv_formatter.py --input-dir outputs --output-dir outputs/formattedAccess the Dask dashboard during processing to monitor:
- Worker activity
- Memory usage
- Task progress
- Performance metrics
Dashboard typically available at: http://localhost:8787
- Reduce chunk sizes in configuration
- Decrease number of Dask workers
- Process variables separately
- Check file patterns in configuration
- Verify data path accessibility
- Review logs for loading errors
- Increase Dask workers for more parallelism
- Optimize chunk sizes for your data
- Consider temporal/spatial subsetting
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is licensed under the MIT License.
- Built on xclim for climate index calculations
- Uses xarray for N-dimensional data handling
- Powered by Dask for parallel computing
- Supports rioxarray for geospatial operations
If you use this pipeline in your research, please cite:
xclim-timber: Climate Data Processing Pipeline
https://github.com/yourusername/xclim-timber
And the xclim library:
Bourgault et al., (2023). xclim: xarray-based climate data analytics.
Journal of Open Source Software, 8(85), 5415, https://doi.org/10.21105/joss.05415