FAIR open radar data

📄 Cite This Work

This repository implements the Radar DataTree framework described in:

Ladino-Rincón & Nesbitt (2025) Radar DataTree: A FAIR and Cloud-Native Framework for Scalable Weather Radar Archives arXiv:2510.24943 [cs.DC] — https://doi.org/10.48550/arXiv.2510.24943

Please cite this work if you use Raw2Zarr or the Radar DataTree model.

Motivation

Weather radar data are among the most scientifically valuable yet structurally underutilized Earth observation datasets. Radars are vital in meteorology, detecting severe weather early and enabling timely warnings, saving lives, and reducing property damage. Beyond real-time forecasting, radar data supports critical applications including statistical analysis, climatology, and long-term atmospheric research.

Despite widespread public availability, operational radar archives remain fragmented, vendor-specific, and poorly aligned with FAIR principles (Findable, Accessible, Interoperable, Reusable). Traditional radar data storage involves millions of standalone binary files in proprietary or semi-standardized formats designed for real-time operations, not scientific reuse. Each radar volume scan, comprising data collected through multiple cone-like sweeps at various elevation angles, is stored as an individual file every 5-10 minutes. This file-centric model creates significant barriers: no temporal indexing, inconsistent metadata encoding, and extensive preprocessing required for analysis at scale.

Radar DataTree addresses these limitations by transforming operational radar archives into FAIR-compliant, cloud-optimized datasets. This framework extends the WMO FM-301/CfRadial 2.1 standard from individual radar volume scans to time-resolved, analysis-ready archives. Built on xarray.DataTree for hierarchical data representation and Icechunk for ACID-compliant transactional storage, Radar DataTree enables:

Dataset-level organization: Entire radar archives as structured, time-indexed collections
Cloud-native access: Zarr serialization optimized for parallel I/O and lazy evaluation
Metadata preservation: Full FM-301/CF compliance with sweep-level detail
Concurrent-safe writes: Icechunk transactions enable real-time ingestion without data corruption
Scalable performance: Demonstrated 100x+ speedups over traditional file-based workflows

This approach leverages a modern Python ecosystem including Xarray, Xradar, Wradlib, and Zarr to implement a hierarchical, tree-like data model aligned with Analysis-Ready Cloud-Optimized (ARCO) principles and FAIR data stewardship.

Authors

Alfonso Ladino-Rincon, Max Grover

Collaborators

This project is currently in high development mode.

Features may change frequently, and some parts of the library may be incomplete or subject to change.
Proceed with caution.

Critical storage requirements

Processing radar data can create massive storage requirements. Based on real-world experience:
- 1 month of data ≈ 800 GB of Zarr output
- 1 year of data ≈ 10+ TB of storage
- Processing for long periods can exhaust storage quickly

Recommendations:
- Start with hourly datasets to understand the storage footprint
- Plan storage capacity before production deployments
- Consider cloud storage costs for large-scale processing
- Monitor disk usage during runs

Radar DataTree Framework

The Radar DataTree framework provides a dataset-level abstraction for weather radar collections, extending the WMO FM-301 standard from individual radar volume scans to time-resolved archives.

Core Architecture

Component	Role
FM-301/CfRadial 2.1	File-level standard for radar volumes and sweeps
xarray.DataTree	Hierarchical in-memory representation of scan collections
Zarr	Chunked, compressed, cloud-native storage format
Icechunk	ACID-compliant transactional engine for versioned datasets

Key Features

Time-Indexed Collections Each radar archive is represented as a hierarchical tree of datasets aligned along a common time axis. Individual volume scans preserve their original FM-301 structure (sweep groups, coordinates, metadata) while being organized into a unified time-series dataset.

Cloud-Native Storage Zarr serialization enables:

Efficient partial reads and lazy evaluation
Parallel I/O across distributed workers
Compressed, chunked arrays optimized for cloud object storage

ACID Transactions with Icechunk Icechunk provides:

Safe concurrent writes from multiple workers
Version-controlled datasets with atomic commits
Real-time ingestion without data corruption
Reproducible analysis with provenance tracking

Demonstrated Performance Case studies on operational NEXRAD archives show:

100x+ speedup for Quasi-Vertical Profile (QVP) generation
70-150x speedup for Quantitative Precipitation Estimation (QPE)
Sub-minute retrieval of multi-week time series from cloud storage

Supported Formats

NEXRAD Level II (including dynamic scans: SAILS, MRLE, AVSET)
SIGMET/IRIS
ODIM_H5

Demo Notebooks

Explore interactive examples at the Radar DataTree Demo Repository:

QVP computation from cloud-hosted archives
QPE accumulation workflows
Time-series extraction and analysis

Getting Started

Running on Your Own Machine

If you are interested in running this material locally on your computer, you will need to follow this workflow:

Clone the "raw2zarr" repository

git clone https://github.com/aladinor/raw2zarr.git

Move into the raw2zarr directory
```
cd raw2zarr
```
Create and activate your conda environment from the environment.yml file
```
conda env create -f environment.yml
conda activate raw2zarr
```
Move into the notebooks directory and start up Jupyterlab
```
cd notebooks/
jupyter lab
```

Processing Modes

The library supports two processing modes for converting radar data to Zarr format. Both modes use Icechunk for ACID-compliant transactional storage, ensuring data integrity during writes.

Sequential Processing (No Cluster Required)

For small datasets, testing, and development:

from raw2zarr.builder.convert import convert_files
from raw2zarr.builder.builder_utils import get_icechunk_repo

# Create repository
repo = get_icechunk_repo("output.zarr")

# Sequential processing
convert_files(
    radar_files=files,
    append_dim="vcp_time",
    repo=repo,
    process_mode="sequential",  # No cluster needed
    engine="nexradlevel2"
)

Parallel Processing (Cluster Required)

For large datasets and production use. Uses Icechunk's Session.fork() API for concurrent-safe parallel writes:

from dask.distributed import LocalCluster
from raw2zarr.builder.convert import convert_files
from raw2zarr.builder.builder_utils import get_icechunk_repo

# Create repository and cluster
repo = get_icechunk_repo("output.zarr")
cluster = LocalCluster(n_workers=4, memory_limit="10GB")

try:
    convert_files(
        radar_files=files,
        append_dim="vcp_time",
        repo=repo,
        process_mode="parallel",
        cluster=cluster,  # Required for parallel mode
        engine="nexradlevel2"
    )
finally:
    cluster.close()

Performance: Parallel processing with Icechunk enables:

Concurrent writes from multiple workers without data corruption
100x+ speedups for large-scale radar analysis tasks
Safe real-time ingestion alongside ongoing analysis

References

Ladino-Rincón, A., & Nesbitt, S. W. (2025). Radar DataTree: A FAIR and Cloud-Native Framework for Scalable Weather Radar Archives. arXiv preprint arXiv:2510.24943 [cs.DC]. https://doi.org/10.48550/arXiv.2510.24943
Abernathey, R. P., et al. (2021). Cloud-Native Repositories for Big Scientific Data. Computing in Science & Engineering, 23(2), 26-35. doi:10.1109/MCSE.2021.3059437

Name		Name	Last commit message	Last commit date
Latest commit History 581 Commits
.github/workflows		.github/workflows
data		data
docs		docs
images		images
notebooks		notebooks
raw2zarr		raw2zarr
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
NOTICE		NOTICE
README.md		README.md
environment-ncsa.yml		environment-ncsa.yml
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FAIR open radar data

📄 Cite This Work

Motivation

Authors

Collaborators

Radar DataTree Framework

Core Architecture

Key Features

Demo Notebooks

Getting Started

Running on Your Own Machine

Processing Modes

Sequential Processing (No Cluster Required)

Parallel Processing (Cluster Required)

References

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

aladinor/raw2zarr

Folders and files

Latest commit

History

Repository files navigation

FAIR open radar data

📄 Cite This Work

Motivation

Authors

Collaborators

Radar DataTree Framework

Core Architecture

Key Features

Demo Notebooks

Getting Started

Running on Your Own Machine

Processing Modes

Sequential Processing (No Cluster Required)

Parallel Processing (Cluster Required)

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages