|
1 | | -# Weather Data Collector - Spain |
| 1 | +# Spanish Weather Data Collection System |
2 | 2 |
|
3 | | -This repository provides scripts to download, update, and manage weather data from AEMET weather stations across Spain, producing three comprehensive datasets for analysis and research. |
| 3 | +Automated collection and processing of Spanish meteorological data from AEMET (Agencia Estatal de Meteorología) OpenData API. |
4 | 4 |
|
| 5 | +## Overview |
5 | 6 |
|
6 | | -## 📊 Current Data Collection Status |
| 7 | +This system collects three standardized weather datasets covering all Spanish weather stations and municipalities: |
7 | 8 |
|
8 | | -*Last updated: 2025-08-25 21:23:49* |
| 9 | +- **Daily Station Data**: Historical daily measurements from 4,000+ weather stations |
| 10 | +- **Municipal Forecasts**: 7-day forecasts for all 8,000+ Spanish municipalities |
| 11 | +- **Hourly Station Data**: High-frequency measurements for recent periods |
9 | 12 |
|
10 | | -### Dataset 1: Daily Station Data |
| 13 | +Data is automatically collected, quality-controlled, and aggregated into standardized CSV files ready for analysis. |
| 14 | + |
| 15 | +## Data Outputs |
| 16 | + |
| 17 | +The system produces three main datasets with standardized variable names: |
| 18 | + |
| 19 | +### 1. `daily_station_historical.csv` |
| 20 | +Daily weather measurements from Spanish meteorological stations. |
| 21 | + |
| 22 | +**Key Variables**: `date`, `station_id`, `temp_mean`, `temp_max`, `temp_min`, `precipitation`, `humidity_mean`, `wind_speed`, `pressure_max`, `pressure_min` |
| 23 | + |
| 24 | +**Coverage**: 4,000+ stations across Spain |
| 25 | +**Time Range**: Recent daily observations |
| 26 | +**Update**: Daily at 2 AM |
| 27 | + |
| 28 | +### 2. `daily_municipal_extended.csv` |
| 29 | +Municipal-level weather data combining forecasts with station aggregations. |
| 30 | + |
| 31 | +**Key Variables**: `date`, `municipality_id`, `temp_mean`, `temp_max`, `temp_min`, `humidity_mean`, `wind_speed` |
| 32 | + |
| 33 | +**Coverage**: 8,000+ Spanish municipalities |
| 34 | +**Data Priority**: Station aggregations take precedence over forecasts when both exist |
| 35 | +**Update**: Daily at 2 AM |
| 36 | + |
| 37 | +### 3. `hourly_station_ongoing.csv` |
| 38 | +High-frequency station measurements for detailed analysis. |
| 39 | + |
| 40 | +**Key Variables**: `datetime`, `station_id`, `variable_type`, `value` |
| 41 | + |
| 42 | +**Coverage**: Selected weather stations |
| 43 | +**Update**: Daily at 2 AM |
| 44 | + |
| 45 | +## Data Flow |
| 46 | + |
| 47 | +``` |
| 48 | +AEMET OpenData API |
| 49 | + ↓ |
| 50 | + Data Collection |
| 51 | + (scripts/r/*.R) |
| 52 | + ↓ |
| 53 | + Quality Control |
| 54 | + & Standardization |
| 55 | + ↓ |
| 56 | + Municipal Aggregation |
| 57 | + (Station → Municipal) |
| 58 | + ↓ |
| 59 | + Final Datasets |
| 60 | + (data/output/*.csv) |
| 61 | +``` |
| 62 | + |
| 63 | +## Technical Implementation |
| 64 | + |
| 65 | +### Collection System |
| 66 | +- **Language**: R with SLURM job scheduling |
| 67 | +- **API Access**: AEMET OpenData with rate limiting |
| 68 | +- **Performance**: climaemet package provides 48x speedup for municipal forecasts |
| 69 | +- **Execution Time**: 2-4 hours total (previously 33+ hours) |
| 70 | + |
| 71 | +### Data Processing |
| 72 | +- **Variable Standardization**: English names with documented units |
| 73 | +- **Quality Control**: Temperature and precipitation validation |
| 74 | +- **Gap Management**: Automatic detection and filling of missing data |
| 75 | +- **Municipality Codes**: CUMUN format from AEMET (documented for merge compatibility) |
| 76 | + |
| 77 | +### Automation |
| 78 | +- **Daily Collection**: 2:00 AM via SLURM scheduler |
| 79 | +- **Gap Filling**: Weekly on Sundays at 1:00 AM |
| 80 | +- **Documentation Updates**: Daily at 6:00 AM |
| 81 | + |
| 82 | +## Getting Started |
| 83 | + |
| 84 | +### Prerequisites |
| 85 | +- SLURM HPC environment with R/4.4.2 and GDAL/3.10.0 |
| 86 | +- AEMET OpenData API key (stored in `auth/keys.R`) |
| 87 | +- Required R packages: tidyverse, climaemet, meteospain, data.table |
| 88 | + |
| 89 | +### Installation |
| 90 | +1. Clone repository |
| 91 | +2. Configure API key in `auth/keys.R` |
| 92 | +3. Install crontab automation: |
| 93 | +```bash |
| 94 | +# Add these lines to crontab -e |
| 95 | +0 2 * * * cd /path/to/project && sbatch scripts/bash/update_weather_hybrid.sh |
| 96 | +0 6 * * * cd /path/to/project && sbatch scripts/bash/update_readme_summary.sh |
| 97 | +0 1 * * 0 cd /path/to/project && sbatch scripts/bash/fill_gaps.sh |
| 98 | +``` |
| 99 | + |
| 100 | +### Manual Execution |
| 101 | +```bash |
| 102 | +# Full data collection |
| 103 | +sbatch scripts/bash/update_weather_hybrid.sh |
| 104 | + |
| 105 | +# Gap analysis and filling |
| 106 | +sbatch scripts/bash/fill_gaps.sh |
| 107 | + |
| 108 | +# Update documentation |
| 109 | +sbatch scripts/bash/update_readme_summary.sh |
| 110 | +``` |
| 111 | + |
| 112 | +## File Structure |
| 113 | + |
| 114 | +``` |
| 115 | +scripts/ |
| 116 | +├── r/ # R collection and analysis scripts |
| 117 | +├── bash/ # SLURM job scripts |
| 118 | +└── archive/ # Archived/unused scripts |
| 119 | +
|
| 120 | +data/ |
| 121 | +├── output/ # Final standardized datasets |
| 122 | +├── backup/ # Data backups and archives |
| 123 | +└── input/ # Reference data (station lists, etc.) |
| 124 | +
|
| 125 | +docs/ # Technical documentation |
| 126 | +auth/ # API credentials (excluded from git) |
| 127 | +logs/ # SLURM job outputs |
| 128 | +``` |
| 129 | + |
| 130 | +## Variable Documentation |
| 131 | + |
| 132 | +All datasets use standardized English variable names. Municipality IDs use CUMUN codes from AEMET. See `docs/variable_standardization.md` for complete mapping from original AEMET variable names. |
| 133 | + |
| 134 | +## Performance Notes |
| 135 | + |
| 136 | +- **Municipal forecasts**: 48x performance improvement using climaemet package |
| 137 | +- **Daily execution**: 2-4 hours total vs 33+ hours with previous approach |
| 138 | +- **Data priority**: Station measurements replace forecasts as they become available |
| 139 | +- **Gap management**: Prevents redundant historical data collection |
| 140 | + |
| 141 | +## License |
| 142 | + |
| 143 | +MIT License - see LICENSE file for details. |
11 | 144 | - **Records**: 2,250 station-days |
12 | 145 | - **Stations**: 838 weather stations |
13 | 146 | - **Coverage**: 2025-08-17 to 2025-08-25 |
|
0 commit comments