eReefs ncAggregate

A Java-based application for performing temporal aggregation and/or regridding of NetCDF files.

ncAggregate is part of the eReefs Platform developed by the Australian Institute of Marine Science (or AIMS), and primarily operates in conjunction with the Job Planner^🔒 and ncAnimate. Job Planner is responsible for planning the work to be performed by ncAggregate and ncAnimate, while ncAnimate creates visualisations from raw NetCDF files and products generated by ncAggregate.

The work to be performed by ncAggregate as part of the AIMS eReefs Platform is defined as Product Definitions in the eReefs Definitions^🔒 repository.

IMPORTANT!

The AIMS Knowledge Systems team is progressively open sourcing the infrastructure of the AIMS eReefs Platform. Some related repositories mentioned in this and other README files may not yet be generally available to the public. If you wish early access to other parts of the system as a collaborator, please contact the Knowledge Systems team at [email protected].

Repository overview

.
|-- env
|   |-- dev               <-- Scripts useful for development/testing.
|
|-- src
|   |-- main
|       |-- java          <-- Source code for the application.
|       |-- resources     <-- Resources included in the packaged application.
|   |-- test
|       |-- java          <-- Unit test cases.
|       |-- resources     <-- Resources referenced by test cases.
|
|-- cloudformation.yaml   <-- Definition of AWS assets.
|-- Dockerfile            <-- Definition of the Docker image to wrap the packaged application.
|-- Jenkinsfile           <-- The Jenkins job definition for this project.
|-- maven-settings.xml    <-- Maven-specific settings.
|-- pom.xml               <-- The Maven definition file for this project.
|-- README.md             <-- This file.

Execution

ncAggregate is a Java-based application that can be executed from the command-line (see "Java-based execution") on any computer with the necessary pre-requisites installed (currently Java 8 and NetCDF libraries).

During normal operation within the AIMS eReefs Platform, ncAggregate and it's pre-requisites are packaged as a Docker image that executes within an AWS ECS Cluster. This Docker image can be leveraged locally to execute ncAggregate from the command-line (see "Docker-based execution") without installing the pre-requisites (though the Docker runtime must be installed).

During normal operations, ncAggregate expects to obtain detailed instructions from a MongoDB database (see "MongoDB-based repository"). This is the configuration recommended for Production use, though a simple file based solution (see "File-based repository") can be used for development. The database/repository is populated from the ereefs-definitions^🔒 project and the Job Planner.

Furthermore, ncAggregate can operate as a stand-alone regridding utility (see "Stand-alone regridding").

MongoDB-based repository

Note: This scenario requires access to system components that have not yet been open-sourced.

Start a MongoDB instance, such as that provided by the ereefs-vm^🔒 project.
Populate the MongoDB instance with Product Definitions. Refer to the ereefs-definitions^🔒 project.
Populate the MongoDB instance with Metadata from downloaded files. This is normally performed by ereefs-download-manager.
Build the list of Tasks by running ereefs-job-planner^🔒.

File-based repository

Note: This scenario requires access to system components that have not yet been open-sourced.

Create a directory to contain the files. The scripts in this project have a default expectation that this directory be /data/ereefs/filedb, however that can be overwritten.
Populate the DB directory with Product definitions. This is normally performed by ereefs definitions^🔒.
Populate the DB directory with Metadata from downloaded files. This is normally performed by ereefs-download-manager.
Build the list of Tasks by running ereefs-job-planner^🔒.

Java-based execution

Note: the pre-requisites (Java 8 and NetCDF libraries) must be installed before ncAggregate can be run from the command-line as a Java-based application.

Package the ncAggregate code and dependencies as a JAR file using the following script:

$ <project root>/env/dev/maven-package.sh

WARNING: Maven requires access to packages hosted on Github. Unfortunately Github requires credentials even though the projects are public. You will need to set the GITHUB_USERNAME and GITHUB_TOKEN environment variables before executing maven-package.sh.

To run ncAggregate as a Java application against a file-based database at /data/ereefs/filedb:

$ <project root>/env/dev/java-run-filedb.sh

To run ncAggregate as a Java application against a file-based database at /my-data:

$ <project root>/env/dev/java-run-filedb.sh -d /my-data

To run ncAggregate as a Java application against a MongoDB-based database:

$ <project root>/env/dev/java-run-mongodb.sh

Docker-based execution

Package the ncAggregate code and dependencies in a Docker image using the following scripts:

$ <project root>/env/dev/maven-package.sh
$ <project root>/env/dev/docker-build-image.sh

To run ncAggregate as a Docker container against a file-based database at /data/ereefs/filedb:

$ <project root>/env/dev/docker-run-filedb.sh -t <task id>

To run ncAggregate as a Docker container against a file-based database at /my-data:

$ <project root>/env/dev/docker-run-filedb.sh -d /my-data -t <task id>

To run ncAggregate as a Docker container against a MongoDB-based database:

$ <project root>/env/dev/docker-run-mongodb.sh -t <task id>

Stand-alone Regridding

Regridding is the process of converting a dataset from one grid to another. The motivation may be to convert a dataset from a curvilinear grid to a rectilinear grid, or to resample a dataset to a different grid resolution. ncAggregate supports regridding a curvilinear grid to a rectilinear grid at a customisable resolution. The mechanics of the regridding process are described in detail in the document titled Technical Guide to Derived Products from CSIRO eReefs Models.

ncAggregate accepts the following regridding parameters:

input - The input directory containing the NetCDF files to be regridded. Note that all NetCDF files in this directory must use the same grid.
output - The output directory for the generated NetCDF files.
cache - A file (and location) for caching the calculations used to map an input grid to an output grid. These calculations are the most intensive part of regridding, so caching allows for faster regridding in future.
resolution - The resolution (in decimal degrees) of the output grid. Default value is "0.03".

Example script

The file <project root>/env/dev/docker-run-regrid.sh shows an example configuration for regridding. The key section of the script is

docker run \
    -u $(id -u):$(id -g) \
    --name "ereefs-ncaggregate" \
    --memory=7.5GB \
    -v `pwd`:/data/ \
    --env EXECUTION_ENVIRONMENT=regrid \
    ereefs-ncaggregate --regrid --input=/data/orig --output=/data/out --cache=/data/regrid-mapper.dat

Notes:

docker run - execution is best performed via Docker, requiring the Docker image to be built using the following commands:

$ <project root>/env/dev/maven-package.sh
$ <project root>/env/dev/docker-build-image.sh

-v 'pwd':/data/ - maps the current directory ('pwd') to /data in the Docker container. This is necessary for ncAggregate to access any files outside of the Docker container.
--regrid - instructs ncAggregate to perform a regrid.
--input=/data/orig - identifies the input directory. The /data prefix matches the mapping to the current directory above, making this mean that input files are in the orig sub-directory of the current directory.
--output=/data/out - identifies the output directory to use. Again, the /data prefix matches the mapping to the current directory above, making this mean that output files are written to the out sub-directory of the current directory.
--cache=/data/regrid-mapper.dat - identifies the name (and location) of the regridding map. Again, the /data prefix matches the mapping to the current directory above, making this mean the cache file is called regrid-mapper.dat and will be created in the current directory.
A resolution has not been specified, allowing ncAggregate to use the default resolution.

Development

Guidelines

Please follow the Standard Github workflow when working on this project.

Background

The domain objects used by ncAggregate are defined in eReefs POJO.

Workflow

The basic workflow of ncAggregate is as follows:

Create the output NetCDF file shell.
Execute the processing pipeline to populate the output NetCDF file.
Upload the resulting output NetCDF file to S3.
Populate a Metadata record for the NetCDF file to the database.

Significant classes/interfaces

See Technical Explanations.

Data Extraction sites

Data Extraction Tasks are supported via the ExtractionSitesBuilderTask and the SiteBasedSummaryAccumulatorImpl classes. This borrows directly from the Zone-based summary logic (see ZoneBasedSummaryAccumulatorImpl).

For each site, ncAggregate increases the size of a search box by a specified step size until it either finds the specified minimum number of neighbours, or it reaches a specified maximum number of increases and stops. See the ExtractionSitesBuilderTask for more information.

Parameters

The ApplicationContextBuilder class contains all parameters supported/expected by ncAggregate. The following parameters can only be set via environment variables:

Env Variable	Description
EXECUTION_ENVIRONMENT	The unique prefix for keys in the parameter store. (mandatory)
TASK_ID	The unique Id for the Task to be processed. (mandatory)
DB_TYPE	The type of database to use. Default is a MongoDB database, but "file" indicates a file-based database. (optional, default is MongoDB)
DB_PATH	The path to the root of a file-based database. Mandatory if `DB_TYPE` is `file`.

The following parameters can be set by either environment variables or via the AWS Parameter Store:

Env Variable	Parameter Store	Description
MONGODB_HOST	/global/mongodb/host	The name (or IP address) of the MongoDB host.
MONGODB_PORT	/global/mongodb/port	The port on which the MongoDB host is listening. Normal value is 27017.
MONGODB_DB	/global/mongodb/db	The isolated "schema" of the database. Normal value is "ereefs".
MONGODB_USER_ID	/ncAggregate/mongodb/userid	Application-specific user. Normal value is "ncaggregate".
MONGODB_PASSWORD	/ncAggregate/mongodb/password	Corresponding password. Value is randomly generated by initialisation script in ereefs-definitions project.

Virtual Machine

The ereefs-vm^🔒 project provides a virtual machine for Windows-based developers.

Pre-requisites

ncAggregate uses several shared libraries which are available via GitHub Packages for Maven. While Jenkins uses the maven-settings.xml file to provide access to the libraries, developers need to take the following steps for a local setup.

Create a personal access token with read:packages permissions only. Give the token a description like Github Package access for Maven.
Copy the maven-settings.xml file to ~/.m2/settings.xml if necessary and replace the GITHUB_USERNAME and GITHUB_TOKEN placeholders with the values from step 1.

Testing

All test cases are run in a Maven Docker container.

$ <project root>/env/dev/maven-test.sh

Build/Package

Before ncAggregate can be built, the pre-requisites must be completed.

To build and package ncAggregate:

# Package as a Java application.
$ <project root>/env/dev/maven-package.sh

# Then package as a Docker image.
$ <project root>/env/dev/docker-build-image.sh

Deployment

Deployment is performed by Jenkins (see <project root>/Jenkinsfile). Any branch can be deployed to the TEST environment, but only the production branch can be deployed to the PROD environment.

Pre-requisites

Amazon Web Services (AWS) Elastic Container Registry (ECR) repository named ereefs-netcdf-aggregator to which the Jenkins Continuous Integration (CI) server will publish Docker images. The URI for this ECR repository should be captured in the ECR_URL parameter in the Jenkinsfile file of this project.
An AWS Identity and Access Management (IAM) Group named ecr-jenkins-publishers with the AmazonEC2ContainerRegistryPowerUser policy attached.
An AWS IAM User named ecr-jenkins-publisher who is a member of the ecr-jenkins-publishers group.
A Credential entry named ereefs-ecr-jenkins-publisher in the Jenkins CI server for the AWS IAM User ecr-jenkins-publisher. The name of this Jenkins credential should be captured in the ECR_CREDENTIALS parameter in the Jenkinsfile file of this project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

eReefs ncAggregate

Table of contents

Repository overview

Execution

MongoDB-based repository

File-based repository

Java-based execution

Docker-based execution

Stand-alone Regridding

Example script

Development

Guidelines

Background

Workflow

Significant classes/interfaces

Data Extraction sites

Parameters

Virtual Machine

Pre-requisites

Testing

Build/Package

Deployment

Pre-requisites

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
docs		docs
env/dev		env/dev
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
TECHNICAL.md		TECHNICAL.md
cloudformation.yaml		cloudformation.yaml
maven-settings.xml		maven-settings.xml
pom.xml		pom.xml

License

open-AIMS/ereefs-netcdf-aggregator

Folders and files

Latest commit

History

Repository files navigation

eReefs ncAggregate

Table of contents

Repository overview

Execution

MongoDB-based repository

File-based repository

Java-based execution

Docker-based execution

Stand-alone Regridding

Example script

Development

Guidelines

Background

Workflow

Significant classes/interfaces

Data Extraction sites

Parameters

Virtual Machine

Pre-requisites

Testing

Build/Package

Deployment

Pre-requisites

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages