You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Redesigns system to store data from each AEMET API separately, to keep original variable names, and to be robust
Cleared old standardized data - Moved daily_station_historical.csv and backups out of the way
Verified the new system works - The historical collection ran successfully with original AEMET variable names
Launched full 4-dataset collection - Job 17923 is now running with 6-hour time limit
Automated collection and processing of Spanish meteorological data from AEMET (Agencia Estatal de Meteorología) OpenData API.
3
+
Automated collection and processing of Spanish meteorological data from AEMET (Agencia Estatal de Meteorología) OpenData API with **original variable names preserved for data integrity**.
4
4
5
+
## Overview
6
+
7
+
This system collects **four distinct weather datasets** from separate AEMET APIs, maintaining original variable names to ensure data integrity and traceability:
5
8
6
-
## Current Data Status
9
+
1.**Historical Daily Stations**: Long-term daily observations from AEMET historical climatological API
10
+
2.**Current Daily Stations**: Recent daily data aggregated from hourly observations (gap-filling)
11
+
3.**Hourly Station Ongoing**: Real-time hourly measurements from current observation API
12
+
4.**Municipal Forecasts**: 7-day forecasts for all Spanish municipalities (ongoing validation collection)
7
13
8
-
*Last updated: 2025-08-26 21:00:46.834172 *
14
+
**Key Principle**: Each dataset preserves original AEMET variable names and is kept separate to avoid data mixing issues.
9
15
10
-
### daily station historical
11
-
-**Records**: 4543
12
-
-**Variables**: 37
13
-
-**Last Modified**: 2025-08-26 20:56:47
14
-
-**File Size**: 0.91 MB
16
+
## Data Outputs
15
17
16
-
### daily municipal extended
17
-
-**Records**: 18232
18
-
-**Variables**: 26
19
-
-**Last Modified**: 2025-08-26 20:56:48
20
-
-**File Size**: 3.01 MB
18
+
The system produces four main datasets with **original AEMET variable names**:
21
19
22
-
### hourly station ongoing
23
-
-**Records**: 0
24
-
-**Variables**: 5
25
-
-**Last Modified**: 2025-08-26 20:56:48
26
-
-**File Size**: 0 MB
20
+
### 1. `daily_stations_historical.csv.gz`
21
+
Daily weather measurements from AEMET historical climatological API.
**All datasets preserve original AEMET variable names** for data integrity. See [docs/variable_names_reference.md](docs/variable_names_reference.md) for complete variable explanations.
65
73
66
-
**Coverage**: Selected weather stations
67
-
**Update**: Daily at 2 AM
74
+
**Key Original Variables**:
75
+
-`fecha` = Date
76
+
-`indicativo`/`idema` = Station ID
77
+
-`tmed`/`ta` = Temperature
78
+
-`prec` = Precipitation
79
+
-`hrMedia`/`hr` = Humidity
80
+
-`velmedia`/`vv` = Wind speed
81
+
-`municipio` = Municipality ID
68
82
69
83
## Data Flow
70
84
71
85
```
72
-
AEMET OpenData API
73
-
↓
74
-
Data Collection
75
-
(scripts/r/*.R)
86
+
AEMET APIs (Separate Sources)
76
87
↓
77
-
Quality Control
78
-
& Standardization
88
+
┌─────────────────────────────────────────┐
89
+
│ 1. Historical API → daily_stations_ │
90
+
│ historical.csv.gz │
91
+
├─────────────────────────────────────────┤
92
+
│ 2. Hourly API → hourly_station_ │
93
+
│ (aggregated) ongoing.csv.gz │
94
+
│ → daily_stations_ │
95
+
│ current.csv.gz │
96
+
├─────────────────────────────────────────┤
97
+
│ 3. Municipal API → daily_municipal_ │
98
+
│ forecast.csv.gz │
99
+
└─────────────────────────────────────────┘
79
100
↓
80
-
Municipal Aggregation
81
-
(Station → Municipal)
101
+
Quality Validation
102
+
(Original names preserved)
82
103
↓
83
-
Final Datasets
84
-
(data/output/*.csv)
104
+
Four Separate Datasets
105
+
(No cross-contamination)
85
106
```
86
107
87
108
## Technical Implementation
88
109
89
110
### Collection System
90
111
-**Language**: R with SLURM job scheduling
91
112
-**API Access**: AEMET OpenData with rate limiting
92
-
-**Performance**: climaemet package provides 48x speedup for municipal forecasts
93
-
-**Execution Time**: 2-4 hours total (previously 33+ hours)
113
+
-**Data Integrity**: Original variable names preserved to prevent confusion
114
+
-**Separation**: Each API source produces distinct datasets to avoid mixing issues
94
115
95
116
### Data Processing
96
-
-**Variable Standardization**: English names with documented units
97
-
-**Quality Control**: Temperature and precipitation validation
98
-
-**Gap Management**: Automatic detection and filling of missing data
99
-
-**Municipality Codes**: CUMUN format from AEMET (documented for merge compatibility)
117
+
-**No Variable Renaming**: Keeps original AEMET names for traceability
118
+
-**Quality Control**: Basic validation without altering source structure
119
+
-**Gap Management**: Separate current daily dataset covers historical-to-present gap
120
+
-**Municipality Codes**: Preserved as provided by each API (different formats noted)
100
121
101
122
### Automation
102
-
-**Daily Collection**: 2:00 AM via SLURM scheduler
103
-
-**Gap Filling**: Weekly on Sundays at 1:00 AM
104
-
-**Documentation Updates**: Daily at 6:00 AM
123
+
-**Daily Collection**: Runs all 4 dataset collections
124
+
-**Validation Collection**: Municipal forecasts accumulated over time for model validation
125
+
-**Documentation**: Auto-updated with original variable references
├── r/ # R collection scripts (4 separate datasets)
141
166
├── bash/ # SLURM job scripts
142
-
└── archive/ # Archived/unused scripts
167
+
└── archive/ # Legacy collection methods
143
168
144
169
data/
145
-
├── output/ # Final standardized datasets
146
-
├── backup/ # Data backups and archives
170
+
├── output/ # Four main datasets with original variable names:
171
+
│ ├── daily_stations_historical.csv.gz
172
+
│ ├── daily_stations_current.csv.gz
173
+
│ ├── hourly_station_ongoing.csv.gz
174
+
│ └── daily_municipal_forecast.csv.gz
147
175
└── input/ # Reference data (station lists, etc.)
148
176
149
-
docs/ # Technical documentation
177
+
docs/ # Documentation including variable name reference
150
178
auth/ # API credentials (excluded from git)
151
179
logs/ # SLURM job outputs
152
180
```
153
181
154
182
## Variable Documentation
155
183
156
-
All datasets use standardized English variable names. Municipality IDs use CUMUN codes from AEMET. See `docs/variable_standardization.md` for complete mapping from original AEMET variable names.
184
+
All datasets preserve **original AEMET variable names** for data integrity. See [docs/variable_names_reference.md](docs/variable_names_reference.md) for complete explanations of what each original variable represents.
185
+
186
+
**No variable renaming or standardization** - this ensures data traceability and prevents confusion about what each measurement represents.
0 commit comments