Skip to content

Commit 99d3633

Browse files
Claudeilaflott
andcommitted
Add documentation for YAML schema validation implementation
Agent-Logs-Url: https://github.com/ilaflott/fremorizer/sessions/00b6554f-37df-4b3b-a2aa-2c1e63750707 Co-authored-by: ilaflott <6273252+ilaflott@users.noreply.github.com>
1 parent 2c1a762 commit 99d3633

File tree

1 file changed

+172
-0
lines changed

1 file changed

+172
-0
lines changed

YAML_SCHEMA_VALIDATION.md

Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# YAML Schema Validation Implementation
2+
3+
## Overview
4+
5+
This document describes the implementation of YAML schema validation for fremorizer, which eliminates the hard dependency on `fre-cli`'s `yamltools` module while maintaining backward compatibility.
6+
7+
## Issue Context
8+
9+
The original issue requested a "YAML approach with schema validation, no fre-cli". The goal was to:
10+
1. Remove the hard dependency on `fre-cli` for YAML processing
11+
2. Implement robust schema validation for YAML configuration files
12+
3. Maintain backward compatibility with existing workflows
13+
14+
## Implementation
15+
16+
### New Modules
17+
18+
#### 1. `fremorizer/cmor_yaml_schema.py`
19+
Defines the JSON schema for CMOR configuration and provides validation functions.
20+
21+
**Key functions:**
22+
- `get_cmor_schema()`: Returns the complete JSON schema definition
23+
- `validate_cmor_yaml(config_dict)`: Validates a configuration dictionary against the schema
24+
25+
**Schema features:**
26+
- Validates all required fields (mip_era, directories, exp_json, table_targets)
27+
- Enforces MIP era enum (CMIP6, CMIP7)
28+
- Validates year formats (YYYY pattern)
29+
- Ensures complete gridding dictionaries when present
30+
- Supports optional fields (start, stop, calendar_type, freq, gridding)
31+
32+
#### 2. `fremorizer/cmor_yaml_consolidator.py`
33+
Provides native YAML loading and validation functionality.
34+
35+
**Key functions:**
36+
- `load_and_validate_yaml(yamlfile, ...)`: Loads and validates a YAML file
37+
- `_expand_variables_in_config(config)`: Expands environment variables recursively
38+
39+
**Features:**
40+
- Single-file YAML loading
41+
- Automatic schema validation
42+
- Environment variable expansion (e.g., `$HOME`, `${VAR_NAME}`)
43+
- Optional output file generation
44+
- Compatibility parameters for fre-cli signature
45+
46+
**Limitations compared to fre-cli:**
47+
- Does not support multi-file YAML consolidation
48+
- Does not support platform/target/experiment-specific overrides
49+
- No variable substitution beyond environment variables
50+
51+
### Modified Modules
52+
53+
#### `fremorizer/cmor_yamler.py`
54+
Updated to use native YAML loading by default, with automatic fallback to fre-cli if available.
55+
56+
**Changes:**
57+
- Lines 27: Added import for `load_and_validate_yaml`
58+
- Lines 96-111: Replaced hard dependency with conditional logic:
59+
- If `fre-cli` is available: Use `consolidate_yamls` (advanced features)
60+
- If `fre-cli` is NOT available: Use native `load_and_validate_yaml`
61+
- No more ImportError - both paths work
62+
63+
### Dependencies
64+
65+
#### Added to `pyproject.toml`:
66+
```python
67+
'jsonschema', # For YAML schema validation
68+
```
69+
70+
#### Added to `environment.yaml`:
71+
```yaml
72+
- conda-forge::jsonschema
73+
```
74+
75+
### Tests
76+
77+
#### `fremorizer/tests/test_cmor_yaml_validation.py`
78+
Comprehensive test suite covering:
79+
- Schema retrieval and structure
80+
- Valid configuration acceptance
81+
- Missing required field rejection
82+
- Invalid enum value rejection
83+
- Invalid year format rejection
84+
- Optional field handling
85+
- YAML file loading
86+
- File not found errors
87+
- Invalid YAML syntax errors
88+
- Schema validation failures
89+
- Environment variable expansion
90+
- Output file generation
91+
92+
**Test results:** All standalone tests pass successfully.
93+
94+
### Documentation
95+
96+
#### Updated `README.md`:
97+
- Added `jsonschema` to requirements
98+
- Added "YAML Processing" section explaining:
99+
- Native YAML loading with validation
100+
- Automatic fallback to fre-cli if available
101+
- Instructions for installing fre-cli for advanced features
102+
- Note about limitations of native loader
103+
104+
#### Created `example_cmor_config.yaml`:
105+
- Demonstrates complete YAML structure
106+
- Includes inline comments explaining each field
107+
- Shows both required and optional fields
108+
- Provides examples for CMIP6 and CMIP7
109+
110+
## Backward Compatibility
111+
112+
The implementation maintains 100% backward compatibility:
113+
114+
1. **With fre-cli installed:** Uses `consolidate_yamls` (no change in behavior)
115+
2. **Without fre-cli installed:** Uses native loader (new functionality)
116+
3. **Existing tests:** Continue to work with mocked `consolidate_yamls`
117+
4. **Configuration format:** Exactly the same structure expected
118+
119+
## Usage Examples
120+
121+
### Basic usage (native loader):
122+
```bash
123+
fremor yaml model.yaml --exp test --platform ncrc4 --target prod
124+
```
125+
126+
### With advanced features (requires fre-cli):
127+
```bash
128+
pip install fre-cli
129+
fremor yaml model.yaml --exp test --platform ncrc4 --target prod
130+
```
131+
132+
### Validating a YAML file:
133+
```python
134+
from fremorizer.cmor_yaml_consolidator import load_and_validate_yaml
135+
136+
config = load_and_validate_yaml('model.yaml')
137+
# Raises ValueError if invalid
138+
```
139+
140+
## Migration Path
141+
142+
For users currently using fre-cli:
143+
1. No changes required - fre-cli continues to work
144+
2. Optional: Remove fre-cli dependency for simpler workflows
145+
3. Optional: Update YAML files to be standalone (no multi-file inheritance)
146+
147+
For new users:
148+
1. Install fremorizer: `pip install fremorizer`
149+
2. Create YAML config using `example_cmor_config.yaml` as template
150+
3. Run: `fremor yaml your_config.yaml`
151+
4. No fre-cli installation needed
152+
153+
## Future Enhancements
154+
155+
Potential improvements:
156+
1. Multi-file YAML consolidation in native loader
157+
2. Platform/target/experiment override support
158+
3. Enhanced variable substitution (beyond environment variables)
159+
4. YAML schema auto-generation from Python types
160+
5. Additional validation rules (path existence, value ranges)
161+
6. Schema versioning for CMIP6 vs CMIP7 differences
162+
163+
## Testing Notes
164+
165+
The full test suite requires the CMOR library, which is not available in all environments.
166+
Standalone tests confirm:
167+
- Schema validation works correctly
168+
- YAML loading and validation works
169+
- Environment variable expansion works
170+
- Backward compatibility is maintained
171+
172+
CI tests will run the full suite in a proper conda environment with all dependencies.

0 commit comments

Comments
 (0)