The Building TimeSeries (BTS) dataset covers three buildings over a three-year period, comprising more than ten thousand timeseries data points with hundreds of unique ontologies. Moreover, the metadata is standardized using the Brick schema.
Buildings play a crucial role in human well-being, influencing occupant comfort, health, and safety. Additionally, they contribute significantly to global energy consumption, accounting for one-third of total energy usage, and carbon emissions. Optimizing building performance presents a vital opportunity to combat climate change and promote human flourishing. However, research in building analytics has been hampered by the lack of accessible, available, and comprehensive real-world datasets on multiple building operations. To demonstrate the utility of this dataset, we performed benchmarks on two tasks: timeseries ontology classification and zero-shot forecasting. These tasks represent an essential initial step in addressing challenges related to interoperability in building analytics.
(last update of this .md file: 12 06 2024)
https://github.com/cruiseresearchgroup/DIEF_BTS/
- Arian Prabowo, UNSW: (Contributor)
CSIRO, Energy and CRUISE research group
- Dataset Owner(s): Arian Prabowo, Matthew Amos, and Flora D. Salim
- Affiliation: UNSW, CSIRO
- Contact: [email protected], [email protected], [email protected]
- Website: CSIRO, Energy and CRUISE research group
- Arian Prabowo, UNSW
- Xiachong Lin, UNSW
- Imran Razzak, UNSW
- Hao Xue, UNSW
- Emily W. Yap, UOW
- Matthew Amos, CSIRO
- Flora D. Salim, UNSW
This is a part of NSW DIEF project: https://research.csiro.au/dch/projects/nsw-dief/
TBA
TBA
- Non-Sensitive Data about people
- Data about natural phenomena
- Data about places and objects
- Data about systems or products and their behaviors
Category | Data |
---|---|
Number of Buildings | 3 |
Size of Dataset | 18.77 GB |
Number of Datapoint | 2 863 795 583 |
Number of Timeseries | 14 547 |
Number of Unqiue Brick Class of the Timeseries | 215 |
Start Date | 01-01-2021 |
End Date | 18-01-2024 |
Duration | 1 112 days |
Above: Summary statistics of the timeseries.
Category | BTS_A | BTS_B | BTS_C |
---|---|---|---|
Collection | 4 (2) | 2 (2) | 8 (1) |
Equipment | 547 (24) | 159 (25) | 963 (41) |
Location | 481 (9) | 68 (17) | 381 (26) |
Point | 8374 (126) | 851 (57) | 10440 (159) |
Alarm | 798 (16) | 5 (2) | 109 (8) |
Command | 363 (6) | 97 (5) | 785 (13) |
Parameter | 79 (6) | 36 (2) | 935 (17) |
Sensor | 4396 (56) | 266 (25) | 4062 (68) |
Setpoint | 772 (26) | 232 (16) | 1629 (41) |
Status | 1628 (17) | 110 (6) | 2187 (19) |
Above: Summary statistics of the Brick Schema Metadata. (Bracketed numbers are the number of unique instances).
Each datapoint in a timeseries is a pair of timestamp and value. A timeseries is a series of datapoints, and it has an associated StreamID. All the metadata about the StreamID are available in the metadata files. The metadata files follow the Brick Schema and formatted as a Turtle .ttl file.
Additional Notes: Not applicable as there are 14 547 fields (timeseries)
The dataset does not contain personally identifiable information.
Regularly Updated - The full version of the dataset will be made available after the competition is completed and the embargo lifted and all the data are released. No information about the upcoming competition is available yet as it is still in the planning stage.
- Time Series
- Demo Link: Check the snippet in https://github.com/cruiseresearchgroup/DIEF_BTS/
Field Name | Field Value | Description |
---|---|---|
t | Numpy array of Timestamp | Timestamp |
v | Numpy array of Float | Field Value |
y | String | Brick Class |
StreamID | String | UUID to link to the metadata. |
This is the string representation of a timeseries in the snippet. Each timeseries is a Python dictionary with 4 items.
{'t': array(['2021-01-01T00:03:16.305000000', '2021-01-01T00:13:44.899000000',
'2021-01-01T00:23:16.203000000', ...,
'2021-08-01T20:45:03.994000000', '2021-08-01T20:55:06.504000000',
'2021-08-01T21:05:05.066000000'], dtype='datetime64[ns]'),
'v': array([18.8, 18.8, 18.2, ..., 32. , 32. , 32. ]),
'y': 'Max_Temperature_Setpoint_Limit',
'StreamID': '213ac15b_3fbd_40b7_b59b_43ab87a09260'}
Additional Notes: N/A
- Research
Timeseries Analysis
, Buildings
, Knowledge Graph
, Spatiotemporal
, Energy Use
.
Importance of building analytics. Building analytics, also known as data-driven smart building, involves the automated adjustment of building operations to minimize emissions and costs, optimize energy usage, and enhance indoor environmental quality and occupant experience, including comfort, health, and safety. This is particularly crucial given that buildings account for a third of global energy usage and a quarter of global carbon emissions, comparable to the transport sector. Optimizing building performance has the potential to significantly mitigate climate change and promote human well-being.
Literature gaps. This dataset addresses two critical gaps in building analytics research. Firstly, the scarcity of publicly available and freely accessible datasets on comprehensive real-world building operations This limitation underscores the need for datasets covering multiple buildings to address the second gap: interoperability in building analytical models. Interoperability is crucial for scalability, allowing models to be applied across diverse buildings with differing characteristics such as climate, usage, size, regulations, budget, and architecture. Additionally, such datasets inherently possess properties of interest to machine learning research, such as domain shift, multimodality, imbalance, and long-tailedness.
- Safe for research use
Building analytics and data-driven smart buildings. Read more about this on IEA EBC Annex81.
For production, especially for buildings with widely different behaviour e.g. not located in Australia.
Building analytics and data-driven smart buildings. Read more about this on IEA EBC Annex81.
Will be made available when the paper is published.
- External - Open Access
- Dataset URL to the https://figshare.com/ repository will be made available after the competition.
- GitHub URL https://github.com/cruiseresearchgroup/DIEF_BTS/
None.
https://help.figshare.com/article/data-access-policy
None
Indefinite
Summary: The dataset will be hosted on https://figshare.com/ and retained according to their policy.
N/A
- Telemetry
Source: Senaps https://products.csiro.au/senaps/
Is this source considered sensitive or high-risk? [Yes]
Dates of Collection: [01 2018 - 01 2024]
Primary modality of collection data:
- Time Series
Update Frequency for collected data:
- Static: Data was collected once from the source.
Source: DCH https://research.csiro.au/dch/
Is this source considered sensitive or high-risk? [Yes]
Dates of Collection: [01 2018 - 01 2024]
Primary modality of collection data:
- Graph Data
Update Frequency for collected data:
- Static: Data was collected once from the source.
- Senaps: Senaps https://products.csiro.au/senaps/. From the website: Senaps is an Internet of Things (IoT) Application Enablement and Data Management cloud-based platform developed and being commercialised by CSIRO’s Data61 Distributed Sensing Systems Group. Senaps is a framework which allows you to build your own product by getting data in, analysing and distributing it to custom user-facing applications. Built-in security, data storage and APIs are allowing companies in agriculture, environment, smart buildings and more, to focus on their competitive advantage. With a basic generic user interface, Senaps combines multiple datasets in a cloud environment with open APIs, allowing users to draw useful insights from data.
- DCH: DCH https://research.csiro.au/dch/. From the website: CSIRO’s Data Clearing House (DCH) is a cloud-based digital platform for housing, managing and extracting valuable insights from smart building data. Allowing data ingestion from a variety of sources, the DCH stores this data in an open format allowing for interoperability and data discovery.
Static: Data was collected once from DCH and Senaps.
A semantic model of the building was created using DCH platform tooling. This created Brick schema class definitions (version 1.2.1) for points within the model, and linked these points to the timeseries data ingested via MQTTS.
This dataset is comprised of data collected onto CSIRO's Data Clearing House (DCH) digital platform . Connecting to the Building Management Systems (BMS), timeseries data is collected from sensors, power, water and gas meters, and other devices within the buildings and uploaded using Message Queuing Telemetry Transport Secured (MQTTS). A semantic model of the building was created using DCH platform tooling. This created Brick schema class definitions (version 1.2.1) for points within the model, and linked these points to the timeseries data ingested via MQTTS.
Identifiers for both the point within the model, and the timeseries identifier were anonymised by generating Universally Unique Identifiers (UUID), and a three-year-period subset of the timeseries data was extracted from the DCH platform to produce this dataset. The data was not cleaned in effort to allow evaluation of various different cleaning algorithm, and to allow the evaluations of algorithms against data with realistic errors.
Comprehensive data from 3 buildings.
Comprehensive data from 3 buildings.
Some information have been excluded for anonymisation purposes.
- Building Management Systems: To manage building operations.
- Building Management Systems: To manage building operations.
- The dataset has been anonymised.
N/A. There are no plan to update this dataset after the full release after the competition.
The dataset does not contain personally identifiable information.
- Unkown
LBNL59: A similar dataset collected from Lawrence Berkeley National Laboratory Building 59
Hong, Tianzhen; Luo, Na; Blum, David; Wang, Zhe (2022). A three-year building operational performance dataset for informing energy efficiency [Dataset]. Dryad. https://doi.org/10.7941/D1N33Q
- Unknown
- Training
- Testing
- Validation
The dataset is sourced from only three buildings in Australia, limiting its geographical diversity. Consequently, models trained on this dataset may not generalize well to buildings in other regions with different climates, regulations, and building practices. This limitation implies that models should primarily be used for research purposes rather than direct deployment.
- None
Engineers constructed the brick schema for each building.
- Human Annotations (Expert)
Annotation Type | Number |
---|---|
Number of buildings annotated | 3 |
(Annotation Type)
Description: The Brick metadata for each buildings are made using tools on the DCH platform.
Link: Relevant URL link.
Platforms, tools, or libraries:
- DCH
- Not Validated
- Unsampled
The Benchmark paper is still under review.
Timeseries Ontology Multi-label Classification, Zero-shot Forecasting.
Definition: A series of time and value pair
Definition (from the website): Brick is an open-source effort to standardize semantic descriptions of the physical, logical and virtual assets in buildings and the relationships between them. Brick consists of an extensible dictionary of terms and concepts in and around buildings, a set of relationships for linking and composing concepts together, and a flexible data model permitting seamless integration of Brick with existing tools and databases. Through the use of powerful Semantic Web technology, Brick can describe the broad set of idiosyncratic and custom features, assets and subsystems found across the building stock in a consistent matter.
Source: https://brickschema.org/
Interpretation: Also can be interpreted as knowledge graph.
Definition (from the website): A Turtle document is a textual representations of an RDF graph.
Source: https://www.w3.org/TR/turtle/
No additional information.