Description
Challenge 24- CliMetLab - Machine Learning on weather and climate data
Stream 2 - Machine Learning for weather, climate and atmosphere applications
Goal
Extend new Python ML package and help to mature package
Mentors and skills
- Mentors: @b8raoult @floriankrb
- Skills required:
- Python
- ML
- Handling of weather data and formats
Challenge description
CliMetLab is a Python package aiming at simplifying access to climate and meteorological datasets, allowing users to focus on science instead of technical issues such as data access and data formats. It is mostly intended to be used in Jupyter notebooks, and be interoperable with all popular data analytic packages, such as NumPy, Pandas, Xarray, SciPy, Matplotlib, etc. and well as Machine Learning frameworks, such as TensorFlow, Keras or PyTorch. Several tasks are proposed:
-
Task 1: extend CliMetLab with so that offers user with high-level Matplotlib-based plotting functions to produce graphs and plot which are relevant to weather and climate applications (e.g. plumes plots, ROC curves, …).
-
Task 2: the Python package Intake is a lightweight set of tools for loading and sharing data in data science projects. Extend CliMetLab so that it seamlessly interfaces with Intake and allow users to access all intake-compatible datasets.
-
Task 3: Xarray uses the data format Zarr to allow parallel read and parallel write. Convert large already available datasets to xarray-readable zarr format, define appropriate configuration (chunking/compression/other) according to domain use cases, develop tools to benchmark when used on a cloud-platform, compare to other formats (N5, GRIB, netCDF, geoTIFF, etc.).
Activity