Skip to content

Commit

Permalink
Merge pull request #52 from hsf-training/conditions-database-example
Browse files Browse the repository at this point in the history
Conditions database example
  • Loading branch information
ruslan33 authored Jan 28, 2025
2 parents ff0cc63 + 81aa735 commit a528c0c
Showing 1 changed file with 115 additions and 36 deletions.
151 changes: 115 additions & 36 deletions _episodes/06-conditions-database.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,107 @@
---
title: "Conditions Database Example Using SQLAlchemy"
teaching: x
exercises: x
teaching: 1.5 hours
exercises: 2
questions:
- ""
- ""
- "What are the key objects in a Conditions Database and how are they related?"
- "How can you use SQLAlchemy to model and query a simple Conditions Database?"
objectives:
- ""
- ""
- "Understand the role of Conditions Databases in high-energy physics."
- "Learn the key concepts: Global Tags, PayloadTypes, Payloads, and IOVs."
- "Model relationships between these objects using SQLAlchemy."
- "Perform basic queries to retrieve conditions data efficiently."
keypoints:
- ""
- ""
- "Conditions Databases store metadata for time-dependent data like alignment and calibration."
- "Global Tags group related PayloadTypes, which contain Payloads valid for specific IOVs."
---

# Lesson: Introduction to Conditions Databases in HEP

## Introduction

# Conditions Database Example Using SQLAlchemy
In high-energy physics, conditions databases (CDBs) play a critical role in managing non-event data. This includes calibration constants, alignment parameters, and detector conditions, which evolve over time. These databases ensure that analysis software can access the correct calibration and alignment data corresponding to the detector's state at any given time, enabling accurate physics measurements.

This lesson demonstrates how to create a simple Conditions Database using SQLAlchemy in Python.
The key objects in CDBs include **Global Tags**, **Payloads**, and **Interval of Validity (IOVs)**. Together, these elements create a framework for managing and retrieving time-dependent data.

## Key Concepts

### Payloads

A **Payload** contains the actual conditions data, such as calibration constants or alignment parameters. Typically, a payload is stored as a file on the filesystem, accessible through a specific path and filename or URL. The CDB manages only the metadata associated with these files, rather than the files themselves. In the CDB, the Payload object is essentially the URL pointing to the file's location, enabling efficient retrieval without directly handling the data.

### PayloadTypes

A **PayloadType** represents a classification for grouping related payloads that belong to the same category of conditions, such as alignment parameters, calibration constants, or detector settings. By organizing payloads under a common type, the CDB simplifies data retrieval and management.

This grouping ensures that, in most cases, only one payload per system is required for a specific query. For example, when retrieving alignment data for a particular detector component, you typically need data corresponding to a specific run number. The system can efficiently filter and return only the relevant payload for that time range, rather than fetching all payloads across all time intervals. This approach enhances consistency, optimizes performance, and simplifies the management of multiple payloads for similar conditions.

### Interval of Validity (IOV)

An **IOV** defines the time range during which a particular payload is valid. It is typically specified in terms of run numbers, timestamps, or lumiblocks, ensuring that the correct data is applied for a given detector state.

### Global Tags

A **Global Tag** is a label that identifies a consistent set of conditions data. It provides a snapshot of the detector state by pointing to specific versions of payloads for different time intervals. Global Tags simplify data retrieval by offering a single entry point for accessing coherent sets of conditions.

## Connections Between Objects

- A **Global Tag** serves as a grouping mechanism that maps to multiple payloads, which are organized by **PayloadType**. Each **PayloadType** groups related payloads (e.g., alignment or calibration constants) to simplify data retrieval.
- Each **Payload** represents a specific piece of conditions data and is valid for the **Interval of Validity (IOV)** associated with it. This ensures that the correct payload is applied for a given run or timestamp.
- During data processing, the Conditions Database (CDB) retrieves the appropriate payload by matching the IOV to the required run or timestamp, ensuring consistency and accuracy.

```mermaid
erDiagram
GlobalTag ||--o{ PayloadType : has
PayloadType ||--o{ PayloadIOV : contains
```

For simplification, in the following example, we work with three objects:

1. **GlobalTag**: Serves as a grouping mechanism for a collection of **PayloadTypes**. In the diagram, this relationship is depicted as a 1-to-many connection, indicating that a single **GlobalTag** can aggregate multiple **PayloadTypes**, each representing a distinct category of conditions. This relationship is implemented in the database by having a foreign key in the **PayloadType** table referencing the **GlobalTag** ID.

2. **PayloadType**: Groups related payloads of the same type (e.g., alignment, calibration) and organizes them for specific conditions. A single **PayloadType** can have multiple **PayloadIOVs** linked to it, representing the actual data for different validity ranges. This relationship is similarly implemented using a foreign key in the **PayloadIOV** table referencing the **PayloadType** ID.

3. **PayloadIOV**: Combines the payload metadata with its validity range (IOV) and provides a URL pointing to the payload file. The system assumes that conditions of the same type may change over time with new IOVs. As a result, the URL pointing to the payload file updates to reflect the new payload, ensuring the correct data is used for processing.

> ### How to Read the Diagram
>
> The diagram visually represents the relationships between these objects. Each block corresponds to a database table, and the connections between them indicate the nature of their relationships:
> - The relationship between **GlobalTag** and **PayloadType** shows that a single **GlobalTag** can group multiple **PayloadTypes**, but each **PayloadType** is associated with exactly one **GlobalTag** (1-to-many).
> - Similarly, the relationship between **PayloadType** and **PayloadIOV** indicates that a single **PayloadType** can group multiple **PayloadIOVs**, but each **PayloadIOV** is tied to one specific **PayloadType**.
>
> These relationships are implemented via foreign keys:
> - The **PayloadType** table includes a foreign key to the **GlobalTag** table.
> - The **PayloadIOV** table includes a foreign key to the **PayloadType** table.
>
> This structure ensures that data integrity is maintained and that each object is correctly linked in the database schema.
## Exercises

1. **Exercise 1: Reproducing the Example**
- Follow the provided example in the next section to define the relationships between `GlobalTag`, `PayloadType`, and `PayloadIOV` using SQLAlchemy.
- Recreate the database structure, populate it with the example data for alignment and calibration conditions, and verify that the tables and relationships are correctly implemented.

2. **Exercise 2: Querying Conditions Data**
- Write a query to retrieve the latest `PayloadIOV` for a specific `GlobalTag` and `IOV`.
- Extend the query to retrieve all payloads for a given `PayloadType`.

These exercises reinforce the concepts and demonstrate how Conditions Databases support real-world data management in high-energy physics experiments.

## Conditions Database Example Using SQLAlchemy

This example demonstrates how to create a simple CDB using SQLAlchemy in Python.
We will define three tables: `GlobalTag`, `PayloadType`, and `PayloadIOV`, and establish relationships
between them. We will then add example data and query the database to retrieve specific entries.

## Imports
### Imports
First, we import the necessary modules from SQLAlchemy.

```python
from sqlalchemy import create_engine, Column, Integer, String, ForeignKey
from sqlalchemy.orm import declarative_base, sessionmaker, relationship
```

## Define ORM Models
### Define ORM Models
We define our ORM models: `GlobalTag`, `PayloadType`, and `PayloadIOV`, along with the necessary relationships.
```python
from sqlalchemy.sql import func, and_
Expand All @@ -47,7 +119,7 @@ Session = sessionmaker(bind=engine)
session = Session()
Base = declarative_base()
```
## Define Tables
### Define Tables
We define all the tables in the database.

```python
Expand Down Expand Up @@ -82,63 +154,70 @@ class PayloadIOV(Base):
# Relationship to PayloadType
payload_type = relationship("PayloadType", back_populates="payload_iovs")
```
## Create Tables
### Create Tables
We create all the tables in the database.

```python
# Create all tables in the database
Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)
```
## Adding Example Data
### Adding Example Data
We add some example data to the database for `GlobalTag`, `PayloadType`, and `PayloadIOV`.

```python
# Adding example data
global_tag = GlobalTag(name="DetectorConfiguration")
global_tag = GlobalTag(name="Conditions")
session.add(global_tag)

daq_payload_type = PayloadType(name="DAQSettings", global_tag=global_tag)
dcs_payload_type = PayloadType(name="DCSSettings", global_tag=global_tag)
calib_payload_type = PayloadType(name="Calibrations", global_tag=global_tag)
align_payload_type = PayloadType(name="Alignment", global_tag=global_tag)

session.add(daq_payload_type)
session.add(dcs_payload_type)
session.add(calib_payload_type)
session.add(align_payload_type)

daq_payload_iovs = [
calib_payload_iovs = [
PayloadIOV(
payload_url="http://example.com/daq1", iov=1, payload_type=daq_payload_type
payload_url="http://example.com/calib_v1.root",
iov=1,
payload_type=calib_payload_type,
),
PayloadIOV(
payload_url="http://example.com/daq2", iov=2, payload_type=daq_payload_type
payload_url="http://example.com/calib_v2.root",
iov=2,
payload_type=calib_payload_type,
),
PayloadIOV(
payload_url="http://example.com/daq3", iov=3, payload_type=daq_payload_type
payload_url="http://example.com/calib_v3.root",
iov=3,
payload_type=calib_payload_type,
),
]

dcs_payload_iovs = [
PayloadIOV(
payload_url="http://example.com/dcs1", iov=1, payload_type=dcs_payload_type
),
align_payload_iovs = [
PayloadIOV(
payload_url="http://example.com/dcs2", iov=2, payload_type=dcs_payload_type
payload_url="http://example.com/align_v1.root",
iov=1,
payload_type=align_payload_type,
),
PayloadIOV(
payload_url="http://example.com/dcs3", iov=3, payload_type=dcs_payload_type
payload_url="http://example.com/align_v2.root",
iov=3,
payload_type=align_payload_type,
),
]

session.add_all(daq_payload_iovs)
session.add_all(dcs_payload_iovs)
session.add_all(calib_payload_iovs)
session.add_all(align_payload_iovs)
session.commit()
```
## Query the Database
### Query the Database
Finally, we query the database to get the latest `PayloadIOV` entries for each `PayloadType` for a specific `GlobalTag` and IOV.

```python
# Query to get the last PayloadIOV entries for each PayloadType for a specific GlobalTag and IOV
requested_iov = 2
requested_gt = "DetectorConfiguration"
requested_gt = "Conditions"

# Subquery to find the maximum IOV for each PayloadType
subquery = (
Expand Down Expand Up @@ -177,5 +256,5 @@ for global_tag_name, payload_type_name, payload_url, max_iov in query:
)
```

GlobalTag: DetectorConfiguration, PayloadType: DAQSettings, PayloadIOV URL: http://example.com/daq2, IOV: 2
GlobalTag: DetectorConfiguration, PayloadType: DCSSettings, PayloadIOV URL: http://example.com/dcs2, IOV: 2
GlobalTag: Conditions, PayloadType: Calibrations, PayloadIOV URL: http://example.com/calib_v2.root, IOV: 2
GlobalTag: Conditions, PayloadType: Alignment, PayloadIOV URL: http://example.com/align_v1.root, IOV: 1

0 comments on commit a528c0c

Please sign in to comment.