Download and convert satellite data for use in ML pipelines
Satellite data is a valuable resource for training machine learning models. Forecasting renewable generation requires knowledge of the weather conditions, and those weather conditions can be inferred and enriched using satellite data.
EUMETSAT provide a range of satellite data products, which are easily available
in NAT
image format. In order to improve its accessibility for training models,
this consumer processes downloaded data into the Zarr
format.
Note
This repo is in early development and so will undergo rapid changes. Breaking changes may occur in the CLI and the API without warning.
Install using the container image:
$ docker pull ghcr.io/openclimatefix/satellite-consumer
or, if you prefer a CLI:
$ pip install git+https://github.com/openclimatefix/satellite-consumer.git
This will put the sat-consumer-cli
command in your virtual environments bin
directory.
$ docker run \
-e SATCONS_COMMAND=consume \
-e SATCONS_SATELLITE=rss \
-e EUMETSAT_CONSUMER_KEY=<your-key> \
-e EUMETSAT_CONSUMER_SECRET=<your-secret> \
-v $(pwd)/work:/work \
ghcr.io/openclimatefix/satellite-consumer
This will download the latest available data for the rss
satellite and store it in the /work
directory.
For a description of all the possible configuration options, see Documentation.
The satellite consumer provides a number of commands for different logical processing of raw data. These commands (and their options) can be seen when using the cli entrypoint:
$ sat-consumer-cli --help
When running the satellite consumer using the environment entrypoint (as in the docker container)
$ sat-consumer
the command is chosen via an environment variable. There are also a number of common configuration options that are shared between all commands:
Variable | Default | Description |
---|---|---|
SATCONS_COMMAND |
The command to run (consume/merge). | |
SATCONS_SATELLITE |
The satellite to consume data from. | |
SATCONS_WORKDIR |
/mnt/disks/sat |
The working directory. In the container, this is set to /work for easy mounting. |
SATCONS_HRV |
false |
Whether to download the HRV channel. |
EUMETSAT_CONSUMER_KEY |
The EUMETSAT consumer key. | |
EUMETSAT_CONSUMER_SECRET |
The EUMETSAT consumer secret. |
Each command then has its own set of configuration options:
Consume:
Downloads scans for a given time and window into a zarr store in the given working directory.
Variable | Default | Description |
---|---|---|
SATCONS_TIME |
The time to consume data for (when using the consume command). Leave unset to download latest available. |
|
SATCONS_WINDOW_MINS |
0 |
The time window to consume data for in minutes (defaults to a single scan). |
SATCONS_WINDOW_MONTHS |
0 |
The number of months to consume data for (takes precedence over SATCONS_WINDOW_MINS ). |
SATCONS_VALIDATE |
false |
Whether to validate the downloaded data. |
SATCONS_RESCALE |
false |
Whether to rescale the downloaded data to the unit interval. |
SATCONS_NUM_WORKERS |
1 |
The number of workers to use for processing. |
Merge:
Merges consumed stores for a given time window into a single store in the working directory.
Variable | Default | Description |
---|---|---|
SATCONS_SATELLITE |
The satellite to consume data from. | |
SATCONS_WINDOW_MINS |
210 |
The time window to merge data for. |
SATCONS_CONSUME_MISSING |
false |
Whether to consume missing data. |
Currently the consumer is built to the specific data requirements of Open Climate Fix.
However, adding a new satellite in the from EUMETSAT shouldn't be too hard, provided it uses
the same seviri_l1b_native
format and sensor channels - just update the available satellites
in config.py
.
OCF recommends using uv for managing your virtual environments.
$ git clone [email protected]:openclimatefix/satellite-consumer.git
$ cd satellite-consumer
$ uv sync
The python package contains a CLI entrypoint for ease of use when developing, which is available
to your shell via the sat-consumer-cli
command, assuming you have built the project in a virtual
environment, and activated it.
This project uses MyPy for static type checking and Ruff for linting. Installing the development dependencies makes them available in your virtual environment.
Use them via:
$ python -m mypy .
$ python -m ruff check .
Be sure to do this periodically while developing to catch any errors early and prevent headaches with the CI pipeline. It may seem like a hassle at first, but it prevents accidental creation of a whole suite of bugs.
There are some additional dependencies to be installed for running the tests,
be sure to pass --extra=dev
to the pip install -e .
command when creating your virtualenv
(uv sync
includes the development dependencies by default, so uv
users can ignore this!).
Run the unit tests with:
$ python -m unittest discover -s src/satellite_consumer -p "test_*.py"
Note
If you have created your virtual environment using uv
, the above can be run via
the Makefile
, using make typecheck
, make lint
, and make test
respectively.
On the directory structure:
- The official PyPA discussion on "source" and "flat" layouts.
- PR's are welcome! See the Organisation Profile for details on contributing
- Find out about our other projects in the here
- Check out the OCF blog for updates
- Follow OCF on LinkedIn
Part of the Open Climate Fix community.