Skip to content

Commit a6ec9dd

Browse files
committed
Reoganizes code and rewrites documentation for clarity
1 parent 48d1302 commit a6ec9dd

File tree

73 files changed

+391
-446
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

73 files changed

+391
-446
lines changed

ACTIVE_SCRIPTS.md

Whitespace-only changes.

CRONTAB_LINES_TO_ADD.txt

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,16 @@
11
# WEATHER DATA COLLECTION - HYBRID SYSTEM
22
# Add these lines to your existing crontab with: crontab -e
33

4-
# Main hybrid collection (daily at 2 AM) - All three datasets
5-
0 2 * * * cd /home/j.palmer/research/weather-data-collector-spain && sbatch update_weather_hybrid.sh
4+
# Main hybrid collection - Daily at 2:00 AM
5+
# Collects all three datasets (station daily, municipal forecasts, hourly)
6+
# Expected completion: 2-4 hours vs 33+ hours with old approach
7+
0 2 * * * cd /home/j.palmer/research/weather-data-collector-spain && sbatch scripts/bash/update_weather_hybrid.sh
8+
9+
# Daily README update (6 AM) - Updates README with current status after collection
10+
0 6 * * * cd /home/j.palmer/research/weather-data-collector-spain && sbatch scripts/bash/update_readme_summary.sh
11+
12+
# Weekly gap filling (Sundays at 1 AM) - Fills any missing data without redundancy
13+
0 1 * * 0 cd /home/j.palmer/research/weather-data-collector-spain && sbatch scripts/bash/fill_gaps.sh
614

715
# Daily README update (6 AM) - Updates README with current status after collection
816
0 6 * * * cd /home/j.palmer/research/weather-data-collector-spain && sbatch scripts/update_readme_summary.sh

README.md

Lines changed: 138 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,146 @@
1-
# Weather Data Collector - Spain
1+
# Spanish Weather Data Collection System
22

3-
This repository provides scripts to download, update, and manage weather data from AEMET weather stations across Spain, producing three comprehensive datasets for analysis and research.
3+
Automated collection and processing of Spanish meteorological data from AEMET (Agencia Estatal de Meteorología) OpenData API.
44

5+
## Overview
56

6-
## 📊 Current Data Collection Status
7+
This system collects three standardized weather datasets covering all Spanish weather stations and municipalities:
78

8-
*Last updated: 2025-08-25 21:23:49*
9+
- **Daily Station Data**: Historical daily measurements from 4,000+ weather stations
10+
- **Municipal Forecasts**: 7-day forecasts for all 8,000+ Spanish municipalities
11+
- **Hourly Station Data**: High-frequency measurements for recent periods
912

10-
### Dataset 1: Daily Station Data
13+
Data is automatically collected, quality-controlled, and aggregated into standardized CSV files ready for analysis.
14+
15+
## Data Outputs
16+
17+
The system produces three main datasets with standardized variable names:
18+
19+
### 1. `daily_station_historical.csv`
20+
Daily weather measurements from Spanish meteorological stations.
21+
22+
**Key Variables**: `date`, `station_id`, `temp_mean`, `temp_max`, `temp_min`, `precipitation`, `humidity_mean`, `wind_speed`, `pressure_max`, `pressure_min`
23+
24+
**Coverage**: 4,000+ stations across Spain
25+
**Time Range**: Recent daily observations
26+
**Update**: Daily at 2 AM
27+
28+
### 2. `daily_municipal_extended.csv`
29+
Municipal-level weather data combining forecasts with station aggregations.
30+
31+
**Key Variables**: `date`, `municipality_id`, `temp_mean`, `temp_max`, `temp_min`, `humidity_mean`, `wind_speed`
32+
33+
**Coverage**: 8,000+ Spanish municipalities
34+
**Data Priority**: Station aggregations take precedence over forecasts when both exist
35+
**Update**: Daily at 2 AM
36+
37+
### 3. `hourly_station_ongoing.csv`
38+
High-frequency station measurements for detailed analysis.
39+
40+
**Key Variables**: `datetime`, `station_id`, `variable_type`, `value`
41+
42+
**Coverage**: Selected weather stations
43+
**Update**: Daily at 2 AM
44+
45+
## Data Flow
46+
47+
```
48+
AEMET OpenData API
49+
50+
Data Collection
51+
(scripts/r/*.R)
52+
53+
Quality Control
54+
& Standardization
55+
56+
Municipal Aggregation
57+
(Station → Municipal)
58+
59+
Final Datasets
60+
(data/output/*.csv)
61+
```
62+
63+
## Technical Implementation
64+
65+
### Collection System
66+
- **Language**: R with SLURM job scheduling
67+
- **API Access**: AEMET OpenData with rate limiting
68+
- **Performance**: climaemet package provides 48x speedup for municipal forecasts
69+
- **Execution Time**: 2-4 hours total (previously 33+ hours)
70+
71+
### Data Processing
72+
- **Variable Standardization**: English names with documented units
73+
- **Quality Control**: Temperature and precipitation validation
74+
- **Gap Management**: Automatic detection and filling of missing data
75+
- **Municipality Codes**: CUMUN format from AEMET (documented for merge compatibility)
76+
77+
### Automation
78+
- **Daily Collection**: 2:00 AM via SLURM scheduler
79+
- **Gap Filling**: Weekly on Sundays at 1:00 AM
80+
- **Documentation Updates**: Daily at 6:00 AM
81+
82+
## Getting Started
83+
84+
### Prerequisites
85+
- SLURM HPC environment with R/4.4.2 and GDAL/3.10.0
86+
- AEMET OpenData API key (stored in `auth/keys.R`)
87+
- Required R packages: tidyverse, climaemet, meteospain, data.table
88+
89+
### Installation
90+
1. Clone repository
91+
2. Configure API key in `auth/keys.R`
92+
3. Install crontab automation:
93+
```bash
94+
# Add these lines to crontab -e
95+
0 2 * * * cd /path/to/project && sbatch scripts/bash/update_weather_hybrid.sh
96+
0 6 * * * cd /path/to/project && sbatch scripts/bash/update_readme_summary.sh
97+
0 1 * * 0 cd /path/to/project && sbatch scripts/bash/fill_gaps.sh
98+
```
99+
100+
### Manual Execution
101+
```bash
102+
# Full data collection
103+
sbatch scripts/bash/update_weather_hybrid.sh
104+
105+
# Gap analysis and filling
106+
sbatch scripts/bash/fill_gaps.sh
107+
108+
# Update documentation
109+
sbatch scripts/bash/update_readme_summary.sh
110+
```
111+
112+
## File Structure
113+
114+
```
115+
scripts/
116+
├── r/ # R collection and analysis scripts
117+
├── bash/ # SLURM job scripts
118+
└── archive/ # Archived/unused scripts
119+
120+
data/
121+
├── output/ # Final standardized datasets
122+
├── backup/ # Data backups and archives
123+
└── input/ # Reference data (station lists, etc.)
124+
125+
docs/ # Technical documentation
126+
auth/ # API credentials (excluded from git)
127+
logs/ # SLURM job outputs
128+
```
129+
130+
## Variable Documentation
131+
132+
All datasets use standardized English variable names. Municipality IDs use CUMUN codes from AEMET. See `docs/variable_standardization.md` for complete mapping from original AEMET variable names.
133+
134+
## Performance Notes
135+
136+
- **Municipal forecasts**: 48x performance improvement using climaemet package
137+
- **Daily execution**: 2-4 hours total vs 33+ hours with previous approach
138+
- **Data priority**: Station measurements replace forecasts as they become available
139+
- **Gap management**: Prevents redundant historical data collection
140+
141+
## License
142+
143+
MIT License - see LICENSE file for details.
11144
- **Records**: 2,250 station-days
12145
- **Stations**: 838 weather stations
13146
- **Coverage**: 2025-08-17 to 2025-08-25

0 commit comments

Comments
 (0)