-
Couldn't load subscription status.
- Fork 1
Home
A total of 165Gb of environmental data has been compiled for use in environmental assessment of Cockburn Sound. These data are sourced from over a dozen agencies and disciplines.
The goal of this CSIEM environmental data management repository is to support the system used to manage data by creating a compatibile and inter-operable data library. Version control and data governance is implemented here to ensure a comprehensive and integrated modelling platform.
The data system has a standard sub-folder structure as shown below. For more information on aspects of the setup, refer to the wiki pages on the right hand side. For more information, see the The Cockburn Sound Integrated Ecosystem Model Manual
This data repository is based around the following structure:
csiem-data/
171M ./code
18M ./data-governance
31M ./data-mapping
165M ./summary-images
98G ./data-lake ! Raw data not included in this GitHub repository : see access for further information.
65G ./data-warehouse. ! Ingested (standardised) data not included in this repository : see access for further information.
TOTAL = 165G
The repository is built around a framework that brings together three separate steps in the data "federation" process:
- Data Collation
- Data Governance & Reporting
- Data Ingestion & Integration
The relationship between the various iniatives, the CSIEM environmental data management framework, and downstream model applications are outlined schematically in the below image.

The aim of the data collation step is to bring data together in a co-ordinated way. Data that is sourced and collated from various government agencies, researchers and industry groups is stored in a “data lake” in their raw format. Each data provider is assigned a unique agency identifier, and datasets are also grouped based on the main programs or initiatives the collection was associated with. Raw data is stored in a rigid folder structure based on these two identifiers :
Agency/Program/ < ... data-sets ...>

Data is added to the data lake is recorded in the data catalogue, and the site-key and variable-key documents are updated. This is done in the data-governance folder.
To standardise data, data in the data-lake is ingested into the data-warehouse. Custom scripts are prepared in code/import to convert the data, and assign metadata signatures to each record. Standard vocabularies are applied during the ingestion phase. The ingested data is able to be packaged up into products for downstream use.
Aquatic EcoDynamics