Home

CSIEM environmental data collection

A total of 165Gb of environmental data has been compiled for use in environmental assessment of Cockburn Sound. These data are sourced from over a dozen agencies and disciplines.

The goal of this CSIEM environmental data management repository is to support the system used to manage data by creating a compatibile and inter-operable data library. Version control and data governance is implemented here to ensure a comprehensive and integrated modelling platform.

The data system has a standard sub-folder structure as shown below. For more information on aspects of the setup, refer to the wiki pages on the right hand side. For more information, see the The Cockburn Sound Integrated Ecosystem Model Manual

Structure for the data repository

This data repository is based around the following structure:

csiem-data/
171M    ./code
18M     ./data-governance
31M     ./data-mapping
165M    ./summary-images
98G     ./data-lake          ! Raw data not included in this GitHub repository : see access for further information. 
65G     ./data-warehouse.    ! Ingested (standardised) data not included in this repository : see access for further information. 
TOTAL = 165G

Understanding the storage of data within the repository

The repository is built around a framework that brings together three separate steps in the data "federation" process:

Data Collation
Data Governance & Reporting
Data Ingestion & Integration

The relationship between the various iniatives, the CSIEM environmental data management framework, and downstream model applications are outlined schematically in the below image.

CSIEM Environmental Information Management

Data Collation

The aim of the data collation step is to bring data together in a co-ordinated way. Data that is sourced and collated from various government agencies, researchers and industry groups is stored in a “data lake” in their raw format. Each data provider is assigned a unique agency identifier, and datasets are also grouped based on the main programs or initiatives the collection was associated with. Raw data is stored in a rigid folder structure based on these two identifiers :

Agency/Program/ < ... data-sets ...>

CSIEM Data Collation

Data Governance

Data is added to the data lake is recorded in the data catalogue, and the site-key and variable-key documents are updated. This is done in the data-governance folder.

Data Ingestion

To standardise data, data in the data-lake is ingested into the data-warehouse. Custom scripts are prepared in code/import to convert the data, and assign metadata signatures to each record. Standard vocabularies are applied during the ingestion phase. The ingested data is able to be packaged up into products for downstream use.

Aquatic EcoDynamics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Home

CSIEM environmental data collection

Structure for the data repository

Understanding the storage of data within the repository

Data Collation

Data Governance

Data Ingestion

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CSIEM Data Wiki

Overview

Governance

Vocabularies

Storage & Access

Data Overview

Maps (NOTE: may not be current)

Clone this wiki locally