Skip to content

Latest commit

 

History

History
764 lines (530 loc) · 25.7 KB

BTS_DataCards.md

File metadata and controls

764 lines (530 loc) · 25.7 KB

Building Timeseries (BTS)

The Building TimeSeries (BTS) dataset covers three buildings over a three-year period, comprising more than ten thousand timeseries data points with hundreds of unique ontologies. Moreover, the metadata is standardized using the Brick schema.

Buildings play a crucial role in human well-being, influencing occupant comfort, health, and safety. Additionally, they contribute significantly to global energy consumption, accounting for one-third of total energy usage, and carbon emissions. Optimizing building performance presents a vital opportunity to combat climate change and promote human flourishing. However, research in building analytics has been hampered by the lack of accessible, available, and comprehensive real-world datasets on multiple building operations. To demonstrate the utility of this dataset, we performed benchmarks on two tasks: timeseries ontology classification and zero-shot forecasting. These tasks represent an essential initial step in addressing challenges related to interoperability in building analytics.

(last update of this .md file: 12 06 2024)

Dataset Link

https://github.com/cruiseresearchgroup/DIEF_BTS/

Data Card Author(s)

  • Arian Prabowo, UNSW: (Contributor)

Authorship

Dataset Owners

Team(s)

CSIRO, Energy and CRUISE research group

Contact Detail(s)

Author(s)

  • Arian Prabowo, UNSW
  • Xiachong Lin, UNSW
  • Imran Razzak, UNSW
  • Hao Xue, UNSW
  • Emily W. Yap, UOW
  • Matthew Amos, CSIRO
  • Flora D. Salim, UNSW

Funding Sources

This is a part of NSW DIEF project: https://research.csiro.au/dch/projects/nsw-dief/

Institution(s)

TBA

Funding or Grant Summary(ies)

TBA

Dataset Overview

Data Subject(s)

  • Non-Sensitive Data about people
  • Data about natural phenomena
  • Data about places and objects
  • Data about systems or products and their behaviors

Dataset Snapshot

Category Data
Number of Buildings 3
Size of Dataset 18.77 GB
Number of Datapoint 2 863 795 583
Number of Timeseries 14 547
Number of Unqiue Brick Class of the Timeseries 215
Start Date 01-01-2021
End Date 18-01-2024
Duration 1 112 days

Above: Summary statistics of the timeseries.

Category BTS_A BTS_B BTS_C
Collection 4 (2) 2 (2) 8 (1)
Equipment 547 (24) 159 (25) 963 (41)
Location 481 (9) 68 (17) 381 (26)
Point 8374 (126) 851 (57) 10440 (159)
Alarm 798 (16) 5 (2) 109 (8)
Command 363 (6) 97 (5) 785 (13)
Parameter 79 (6) 36 (2) 935 (17)
Sensor 4396 (56) 266 (25) 4062 (68)
Setpoint 772 (26) 232 (16) 1629 (41)
Status 1628 (17) 110 (6) 2187 (19)

Above: Summary statistics of the Brick Schema Metadata. (Bracketed numbers are the number of unique instances).

Content Description

Each datapoint in a timeseries is a pair of timestamp and value. A timeseries is a series of datapoints, and it has an associated StreamID. All the metadata about the StreamID are available in the metadata files. The metadata files follow the Brick Schema and formatted as a Turtle .ttl file.

Descriptive Statistics

Additional Notes: Not applicable as there are 14 547 fields (timeseries)

Sensitivity of Data

The dataset does not contain personally identifiable information.

Dataset Version and Maintenance

Maintenance Status

Regularly Updated - The full version of the dataset will be made available after the competition is completed and the embargo lifted and all the data are released. No information about the upcoming competition is available yet as it is still in the planning stage.

Example of Data Points

Primary Data Modality

  • Time Series

Sampling of Data Points

Data Fields

Field Name Field Value Description
t Numpy array of Timestamp Timestamp
v Numpy array of Float Field Value
y String Brick Class
StreamID String UUID to link to the metadata.

Typical Data Point

This is the string representation of a timeseries in the snippet. Each timeseries is a Python dictionary with 4 items.

{'t': array(['2021-01-01T00:03:16.305000000', '2021-01-01T00:13:44.899000000',
        '2021-01-01T00:23:16.203000000', ...,
        '2021-08-01T20:45:03.994000000', '2021-08-01T20:55:06.504000000',
        '2021-08-01T21:05:05.066000000'], dtype='datetime64[ns]'),
 'v': array([18.8, 18.8, 18.2, ..., 32. , 32. , 32. ]),
 'y': 'Max_Temperature_Setpoint_Limit',
 'StreamID': '213ac15b_3fbd_40b7_b59b_43ab87a09260'}

Atypical Data Point

Additional Notes: N/A

Motivations & Intentions

Motivations

Purpose(s)

  • Research

Domain(s) of Application

Timeseries Analysis, Buildings, Knowledge Graph, Spatiotemporal, Energy Use.

Motivating Factor(s)

Importance of building analytics. Building analytics, also known as data-driven smart building, involves the automated adjustment of building operations to minimize emissions and costs, optimize energy usage, and enhance indoor environmental quality and occupant experience, including comfort, health, and safety. This is particularly crucial given that buildings account for a third of global energy usage and a quarter of global carbon emissions, comparable to the transport sector. Optimizing building performance has the potential to significantly mitigate climate change and promote human well-being.

Literature gaps. This dataset addresses two critical gaps in building analytics research. Firstly, the scarcity of publicly available and freely accessible datasets on comprehensive real-world building operations This limitation underscores the need for datasets covering multiple buildings to address the second gap: interoperability in building analytical models. Interoperability is crucial for scalability, allowing models to be applied across diverse buildings with differing characteristics such as climate, usage, size, regulations, budget, and architecture. Additionally, such datasets inherently possess properties of interest to machine learning research, such as domain shift, multimodality, imbalance, and long-tailedness.

Intended Use

Dataset Use(s)

  • Safe for research use

Suitable Use Case(s)

Building analytics and data-driven smart buildings. Read more about this on IEA EBC Annex81.

Unsuitable Use Case(s)

For production, especially for buildings with widely different behaviour e.g. not located in Australia.

Research and Problem Space(s)

Building analytics and data-driven smart buildings. Read more about this on IEA EBC Annex81.

Citation Guidelines

Will be made available when the paper is published.

Access, Rentention, & Wipeout

Access

Access Type

  • External - Open Access

Documentation Link(s)

Prerequisite(s)

None.

Policy Link(s)

https://help.figshare.com/article/data-access-policy

Access Control List(s)

None

Retention

Duration

Indefinite

Policy Summary

Summary: The dataset will be hosted on https://figshare.com/ and retained according to their policy.

Wipeout and Deletion

N/A

Provenance

Collection

Method(s) Used

  • Telemetry

Methodology Detail(s)

Source: Senaps https://products.csiro.au/senaps/

Is this source considered sensitive or high-risk? [Yes]

Dates of Collection: [01 2018 - 01 2024]

Primary modality of collection data:

  • Time Series

Update Frequency for collected data:

  • Static: Data was collected once from the source.

Source: DCH https://research.csiro.au/dch/

Is this source considered sensitive or high-risk? [Yes]

Dates of Collection: [01 2018 - 01 2024]

Primary modality of collection data:

  • Graph Data

Update Frequency for collected data:

  • Static: Data was collected once from the source.

Source Description(s)

  • Senaps: Senaps https://products.csiro.au/senaps/. From the website: Senaps is an Internet of Things (IoT) Application Enablement and Data Management cloud-based platform developed and being commercialised by CSIRO’s Data61 Distributed Sensing Systems Group. Senaps is a framework which allows you to build your own product by getting data in, analysing and distributing it to custom user-facing applications. Built-in security, data storage and APIs are allowing companies in agriculture, environment, smart buildings and more, to focus on their competitive advantage. With a basic generic user interface, Senaps combines multiple datasets in a cloud environment with open APIs, allowing users to draw useful insights from data.
  • DCH: DCH https://research.csiro.au/dch/. From the website: CSIRO’s Data Clearing House (DCH) is a cloud-based digital platform for housing, managing and extracting valuable insights from smart building data. Allowing data ingestion from a variety of sources, the DCH stores this data in an open format allowing for interoperability and data discovery.

Collection Cadence

Static: Data was collected once from DCH and Senaps.

Data Integration

A semantic model of the building was created using DCH platform tooling. This created Brick schema class definitions (version 1.2.1) for points within the model, and linked these points to the timeseries data ingested via MQTTS.

Data Processing

This dataset is comprised of data collected onto CSIRO's Data Clearing House (DCH) digital platform . Connecting to the Building Management Systems (BMS), timeseries data is collected from sensors, power, water and gas meters, and other devices within the buildings and uploaded using Message Queuing Telemetry Transport Secured (MQTTS). A semantic model of the building was created using DCH platform tooling. This created Brick schema class definitions (version 1.2.1) for points within the model, and linked these points to the timeseries data ingested via MQTTS.

Identifiers for both the point within the model, and the timeseries identifier were anonymised by generating Universally Unique Identifiers (UUID), and a three-year-period subset of the timeseries data was extracted from the DCH platform to produce this dataset. The data was not cleaned in effort to allow evaluation of various different cleaning algorithm, and to allow the evaluations of algorithms against data with realistic errors.

Collection Criteria

Data Selection

Comprehensive data from 3 buildings.

Data Inclusion

Comprehensive data from 3 buildings.

Data Exclusion

Some information have been excluded for anonymisation purposes.

Relationship to Source

Use & Utility(ies)

  • Building Management Systems: To manage building operations.

Benefit and Value(s)

  • Building Management Systems: To manage building operations.

Limitation(s) and Trade-Off(s)

  • The dataset has been anonymised.

Changes on Update(s)

N/A. There are no plan to update this dataset after the full release after the competition.

Human and Other Sensitive Attributes

The dataset does not contain personally identifiable information.

Extended Use

Use with Other Data

Safety Level

  • Unkown

Limitation(s) and Recommendation(s)

LBNL59: A similar dataset collected from Lawrence Berkeley National Laboratory Building 59

Hong, Tianzhen; Luo, Na; Blum, David; Wang, Zhe (2022). A three-year building operational performance dataset for informing energy efficiency [Dataset]. Dryad. https://doi.org/10.7941/D1N33Q

Forking & Sampling

Safety Level

  • Unknown

Use in ML or AI Systems

Dataset Use(s)

  • Training
  • Testing
  • Validation

Usage Guideline(s)

The dataset is sourced from only three buildings in Australia, limiting its geographical diversity. Consequently, models trained on this dataset may not generalize well to buildings in other regions with different climates, regulations, and building practices. This limitation implies that models should primarily be used for research purposes rather than direct deployment.

Transformations

Synopsis

Transformation(s) Applied

  • None

Annotations & Labeling

Engineers constructed the brick schema for each building.

Annotation Workforce Type

  • Human Annotations (Expert)

Annotation Characteristic(s)

Annotation Type Number
Number of buildings annotated 3

Annotation Description(s)

(Annotation Type)

Description: The Brick metadata for each buildings are made using tools on the DCH platform.

Link: Relevant URL link.

Platforms, tools, or libraries:

  • DCH

Validation Types

Method(s)

  • Not Validated

Sampling Methods

Method(s) Used

  • Unsampled

Known Applications & Benchmarks

The Benchmark paper is still under review.

ML Application(s)

Timeseries Ontology Multi-label Classification, Zero-shot Forecasting.

Terms of Art

Concepts and Definitions referenced in this Data Card

Timeseries

Definition: A series of time and value pair

Brick Schema

Definition (from the website): Brick is an open-source effort to standardize semantic descriptions of the physical, logical and virtual assets in buildings and the relationships between them. Brick consists of an extensible dictionary of terms and concepts in and around buildings, a set of relationships for linking and composing concepts together, and a flexible data model permitting seamless integration of Brick with existing tools and databases. Through the use of powerful Semantic Web technology, Brick can describe the broad set of idiosyncratic and custom features, assets and subsystems found across the building stock in a consistent matter.

Source: https://brickschema.org/

Interpretation: Also can be interpreted as knowledge graph.

Turtle File

Definition (from the website): A Turtle document is a textual representations of an RDF graph.

Source: https://www.w3.org/TR/turtle/

Reflections on Data

No additional information.