|
1 | | -First Monitoring |
| 1 | +🎯 First Monitoring |
2 | 2 | ======================== |
3 | 3 |
|
4 | | -Your first monitoring setup (todo). |
| 4 | +This tutorial walks you through building a complete data quality monitoring setup from scratch. You'll learn each concept step-by-step and understand how all the pieces fit together. |
| 5 | + |
| 6 | +What We'll Build |
| 7 | +---------------- |
| 8 | + |
| 9 | +We'll create a monitoring system for a **real-time IoT sensor network** that tracks: |
| 10 | + |
| 11 | +- Temperature readings from multiple sensors |
| 12 | +- Data volume and availability issues |
| 13 | +- Value range violations and anomalies |
| 14 | +- Sensor health and connectivity problems |
| 15 | + |
| 16 | +.. admonition:: What You'll Learn |
| 17 | + :class: tip |
| 18 | + |
| 19 | + - How to configure Stream DaQ for your specific use case |
| 20 | + - Understanding windows and their impact on monitoring |
| 21 | + - Creating meaningful quality assessments |
| 22 | + - Interpreting and acting on monitoring results |
| 23 | + |
| 24 | +Step 1: Understanding Your Data |
| 25 | +------------------------------- |
| 26 | + |
| 27 | +First, let's look at the data we want to monitor: |
| 28 | + |
| 29 | +.. code-block:: python |
| 30 | +
|
| 31 | + import pandas as pd |
| 32 | + from datetime import datetime, timedelta |
| 33 | + import numpy as np |
| 34 | +
|
| 35 | + # Sample IoT sensor data |
| 36 | + def create_sensor_data(): |
| 37 | + """Generate realistic IoT sensor readings with some quality issues""" |
| 38 | + np.random.seed(42) # For reproducible results |
| 39 | +
|
| 40 | + sensors = ['sensor_01', 'sensor_02', 'sensor_03', 'sensor_04'] |
| 41 | + data = [] |
| 42 | + base_time = datetime.now() |
| 43 | +
|
| 44 | + for i in range(100): |
| 45 | + for sensor in sensors: |
| 46 | + # Normal temperature: 18-25°C with some variation |
| 47 | + temp = np.random.normal(21.5, 2.0) |
| 48 | +
|
| 49 | + # Introduce some quality issues |
| 50 | + if sensor == 'sensor_02' and 30 <= i <= 40: |
| 51 | + # Sensor_02 gets stuck (frozen readings) |
| 52 | + temp = 23.1 |
| 53 | + elif sensor == 'sensor_03' and i > 70: |
| 54 | + # Sensor_03 starts giving extreme readings |
| 55 | + temp = np.random.choice([45.0, -10.0, 23.0]) |
| 56 | + elif sensor == 'sensor_04' and 20 <= i <= 25: |
| 57 | + # Sensor_04 goes offline (missing data) |
| 58 | + continue |
| 59 | +
|
| 60 | + data.append({ |
| 61 | + 'sensor_id': sensor, |
| 62 | + 'temperature': round(temp, 1), |
| 63 | + 'timestamp': base_time + timedelta(seconds=i * 10), |
| 64 | + 'location': f'Building_{sensor[-1]}' |
| 65 | + }) |
| 66 | +
|
| 67 | + return pd.DataFrame(data) |
| 68 | +
|
| 69 | + # Create our sample data |
| 70 | + sensor_data = create_sensor_data() |
| 71 | + print("Sample of our sensor data:") |
| 72 | + print(sensor_data.head(10)) |
| 73 | +
|
| 74 | +Expected output: |
| 75 | + |
| 76 | +.. code-block:: |
| 77 | +
|
| 78 | + sensor_id temperature timestamp location |
| 79 | + 0 sensor_01 24.0 2024-01-15 10:00:00 Building_1 |
| 80 | + 1 sensor_02 19.8 2024-01-15 10:00:00 Building_2 |
| 81 | + 2 sensor_03 23.2 2024-01-15 10:00:00 Building_3 |
| 82 | + 3 sensor_04 21.1 2024-01-15 10:00:00 Building_4 |
| 83 | + ... |
| 84 | +
|
| 85 | +Step 2: Configure Your Monitor |
| 86 | +------------------------------ |
| 87 | + |
| 88 | +Now let's set up Stream DaQ to monitor this data: |
| 89 | + |
| 90 | +.. code-block:: python |
| 91 | +
|
| 92 | + from streamdaq import StreamDaQ, DaQMeasures as dqm, Windows |
| 93 | +
|
| 94 | + # Configure the monitoring setup |
| 95 | + daq = StreamDaQ().configure( |
| 96 | + window=Windows.tumbling(60), # 60-second windows |
| 97 | + instance="sensor_id", # Monitor each sensor separately |
| 98 | + time_column="timestamp", # Use timestamp for windowing |
| 99 | + wait_for_late=10, # Wait 10 seconds for late arrivals |
| 100 | + time_format=None # Auto-detect datetime format |
| 101 | + ) |
| 102 | +
|
| 103 | +**Let's understand each configuration parameter:** |
| 104 | + |
| 105 | +.. grid:: 1 1 2 2 |
| 106 | + :gutter: 3 |
| 107 | + |
| 108 | + .. grid-item-card:: **window**: ``Windows.tumbling(60)`` |
| 109 | + :class-header: bg-info text-white |
| 110 | + |
| 111 | + Creates **non-overlapping 60-second windows**. Each data point belongs to exactly one window. |
| 112 | + |
| 113 | + .. grid-item-card:: **instance**: ``"sensor_id"`` |
| 114 | + :class-header: bg-info text-white |
| 115 | + |
| 116 | + **Monitor each sensor separately**. Quality metrics are calculated per sensor per window. |
| 117 | + |
| 118 | + .. grid-item-card:: **time_column**: ``"timestamp"`` |
| 119 | + :class-header: bg-info text-white |
| 120 | + |
| 121 | + **Which column contains the event time** for windowing and ordering. |
| 122 | + |
| 123 | + .. grid-item-card:: **wait_for_late**: ``10`` |
| 124 | + :class-header: bg-info text-white |
| 125 | + |
| 126 | + **Wait 10 seconds** for late-arriving data before finalizing a window. |
| 127 | + |
| 128 | +Step 3: Define Quality Measures |
| 129 | +------------------------------- |
| 130 | + |
| 131 | +Let's add quality checks that make sense for IoT sensor monitoring: |
| 132 | + |
| 133 | +.. code-block:: python |
| 134 | +
|
| 135 | + # Add data quality measures |
| 136 | + daq.add( |
| 137 | + measure=dqm.count('temperature'), |
| 138 | + assess=">3", # Expect at least 4 readings per minute per sensor |
| 139 | + name="sufficient_data" |
| 140 | + ).add( |
| 141 | + measure=dqm.mean('temperature'), |
| 142 | + assess="(15.0, 30.0)", # Average temp should be reasonable |
| 143 | + name="avg_temp_normal" |
| 144 | + ).add( |
| 145 | + measure=dqm.max('temperature'), |
| 146 | + assess="<=35.0", # Max temp shouldn't exceed 35°C |
| 147 | + name="no_extreme_high" |
| 148 | + ).add( |
| 149 | + measure=dqm.min('temperature'), |
| 150 | + assess=">=-5.0", # Min temp shouldn't go below -5°C |
| 151 | + name="no_extreme_low" |
| 152 | + ).add( |
| 153 | + measure=dqm.distinct_count('temperature'), |
| 154 | + assess=">1", # Values should vary (detect frozen sensors) |
| 155 | + name="values_vary" |
| 156 | + ) |
| 157 | +
|
| 158 | +**Understanding Assessment Syntax:** |
| 159 | + |
| 160 | +.. list-table:: |
| 161 | + :header-rows: 1 |
| 162 | + :widths: 30 70 |
| 163 | + |
| 164 | + * - Assessment |
| 165 | + - Meaning |
| 166 | + * - ``">3"`` |
| 167 | + - Value must be greater than 3 |
| 168 | + * - ``"(15.0, 30.0)"`` |
| 169 | + - Value must be between 15.0 and 30.0 (exclusive) |
| 170 | + * - ``"<=35.0"`` |
| 171 | + - Value must be less than or equal to 35.0 |
| 172 | + * - ``">=-5.0"`` |
| 173 | + - Value must be greater than or equal to -5.0 |
| 174 | + * - ``">1"`` |
| 175 | + - Value must be greater than 1 |
| 176 | + |
| 177 | +Step 4: Run the Monitoring |
| 178 | +-------------------------- |
| 179 | + |
| 180 | +Now let's start monitoring and see the results: |
| 181 | + |
| 182 | +.. code-block:: python |
| 183 | +
|
| 184 | + print("🚀 Starting IoT sensor monitoring...") |
| 185 | + print("🌡️ Analyzing temperature data quality...") |
| 186 | +
|
| 187 | + # Run the monitoring |
| 188 | + results = daq.watch_out(sensor_data) |
| 189 | +
|
| 190 | + print("✅ Monitoring complete!") |
| 191 | + print("\nQuality assessment results:") |
| 192 | + print(results) |
| 193 | +
|
| 194 | +Expected output (abbreviated): |
| 195 | + |
| 196 | +.. code-block:: |
| 197 | +
|
| 198 | + 🚀 Starting IoT sensor monitoring... |
| 199 | + 🌡️ Analyzing temperature data quality... |
| 200 | +
|
| 201 | + | sensor_id | window_start | window_end | sufficient_data | avg_temp_normal | no_extreme_high | no_extreme_low | values_vary | |
| 202 | + |-----------|---------------------|---------------------|-----------------|-----------------|-----------------|----------------|-------------| |
| 203 | + | sensor_01 | 2024-01-15 10:00:00 | 2024-01-15 10:01:00 | (6, True) | (21.8, True) | (24.5, True) | (19.2, True) | (6, True) | |
| 204 | + | sensor_02 | 2024-01-15 10:05:00 | 2024-01-15 10:06:00 | (6, True) | (23.1, True) | (23.1, True) | (23.1, True) | (1, False) | |
| 205 | + | sensor_03 | 2024-01-15 10:11:00 | 2024-01-15 10:12:00 | (6, True) | (19.4, False) | (45.0, False) | (-10.0, False) | (3, True) | |
| 206 | + | sensor_04 | 2024-01-15 10:03:00 | 2024-01-15 10:04:00 | (2, False) | (21 |
0 commit comments