Skip to content

Commit 15c1148

Browse files
committed
Add a first bulk version for the getting-started subsection of the docs
1 parent 10c6c6c commit 15c1148

3 files changed

Lines changed: 452 additions & 15 deletions

File tree

Lines changed: 204 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,206 @@
1-
First Monitoring
1+
🎯 First Monitoring
22
========================
33

4-
Your first monitoring setup (todo).
4+
This tutorial walks you through building a complete data quality monitoring setup from scratch. You'll learn each concept step-by-step and understand how all the pieces fit together.
5+
6+
What We'll Build
7+
----------------
8+
9+
We'll create a monitoring system for a **real-time IoT sensor network** that tracks:
10+
11+
- Temperature readings from multiple sensors
12+
- Data volume and availability issues
13+
- Value range violations and anomalies
14+
- Sensor health and connectivity problems
15+
16+
.. admonition:: What You'll Learn
17+
:class: tip
18+
19+
- How to configure Stream DaQ for your specific use case
20+
- Understanding windows and their impact on monitoring
21+
- Creating meaningful quality assessments
22+
- Interpreting and acting on monitoring results
23+
24+
Step 1: Understanding Your Data
25+
-------------------------------
26+
27+
First, let's look at the data we want to monitor:
28+
29+
.. code-block:: python
30+
31+
import pandas as pd
32+
from datetime import datetime, timedelta
33+
import numpy as np
34+
35+
# Sample IoT sensor data
36+
def create_sensor_data():
37+
"""Generate realistic IoT sensor readings with some quality issues"""
38+
np.random.seed(42) # For reproducible results
39+
40+
sensors = ['sensor_01', 'sensor_02', 'sensor_03', 'sensor_04']
41+
data = []
42+
base_time = datetime.now()
43+
44+
for i in range(100):
45+
for sensor in sensors:
46+
# Normal temperature: 18-25°C with some variation
47+
temp = np.random.normal(21.5, 2.0)
48+
49+
# Introduce some quality issues
50+
if sensor == 'sensor_02' and 30 <= i <= 40:
51+
# Sensor_02 gets stuck (frozen readings)
52+
temp = 23.1
53+
elif sensor == 'sensor_03' and i > 70:
54+
# Sensor_03 starts giving extreme readings
55+
temp = np.random.choice([45.0, -10.0, 23.0])
56+
elif sensor == 'sensor_04' and 20 <= i <= 25:
57+
# Sensor_04 goes offline (missing data)
58+
continue
59+
60+
data.append({
61+
'sensor_id': sensor,
62+
'temperature': round(temp, 1),
63+
'timestamp': base_time + timedelta(seconds=i * 10),
64+
'location': f'Building_{sensor[-1]}'
65+
})
66+
67+
return pd.DataFrame(data)
68+
69+
# Create our sample data
70+
sensor_data = create_sensor_data()
71+
print("Sample of our sensor data:")
72+
print(sensor_data.head(10))
73+
74+
Expected output:
75+
76+
.. code-block::
77+
78+
sensor_id temperature timestamp location
79+
0 sensor_01 24.0 2024-01-15 10:00:00 Building_1
80+
1 sensor_02 19.8 2024-01-15 10:00:00 Building_2
81+
2 sensor_03 23.2 2024-01-15 10:00:00 Building_3
82+
3 sensor_04 21.1 2024-01-15 10:00:00 Building_4
83+
...
84+
85+
Step 2: Configure Your Monitor
86+
------------------------------
87+
88+
Now let's set up Stream DaQ to monitor this data:
89+
90+
.. code-block:: python
91+
92+
from streamdaq import StreamDaQ, DaQMeasures as dqm, Windows
93+
94+
# Configure the monitoring setup
95+
daq = StreamDaQ().configure(
96+
window=Windows.tumbling(60), # 60-second windows
97+
instance="sensor_id", # Monitor each sensor separately
98+
time_column="timestamp", # Use timestamp for windowing
99+
wait_for_late=10, # Wait 10 seconds for late arrivals
100+
time_format=None # Auto-detect datetime format
101+
)
102+
103+
**Let's understand each configuration parameter:**
104+
105+
.. grid:: 1 1 2 2
106+
:gutter: 3
107+
108+
.. grid-item-card:: **window**: ``Windows.tumbling(60)``
109+
:class-header: bg-info text-white
110+
111+
Creates **non-overlapping 60-second windows**. Each data point belongs to exactly one window.
112+
113+
.. grid-item-card:: **instance**: ``"sensor_id"``
114+
:class-header: bg-info text-white
115+
116+
**Monitor each sensor separately**. Quality metrics are calculated per sensor per window.
117+
118+
.. grid-item-card:: **time_column**: ``"timestamp"``
119+
:class-header: bg-info text-white
120+
121+
**Which column contains the event time** for windowing and ordering.
122+
123+
.. grid-item-card:: **wait_for_late**: ``10``
124+
:class-header: bg-info text-white
125+
126+
**Wait 10 seconds** for late-arriving data before finalizing a window.
127+
128+
Step 3: Define Quality Measures
129+
-------------------------------
130+
131+
Let's add quality checks that make sense for IoT sensor monitoring:
132+
133+
.. code-block:: python
134+
135+
# Add data quality measures
136+
daq.add(
137+
measure=dqm.count('temperature'),
138+
assess=">3", # Expect at least 4 readings per minute per sensor
139+
name="sufficient_data"
140+
).add(
141+
measure=dqm.mean('temperature'),
142+
assess="(15.0, 30.0)", # Average temp should be reasonable
143+
name="avg_temp_normal"
144+
).add(
145+
measure=dqm.max('temperature'),
146+
assess="<=35.0", # Max temp shouldn't exceed 35°C
147+
name="no_extreme_high"
148+
).add(
149+
measure=dqm.min('temperature'),
150+
assess=">=-5.0", # Min temp shouldn't go below -5°C
151+
name="no_extreme_low"
152+
).add(
153+
measure=dqm.distinct_count('temperature'),
154+
assess=">1", # Values should vary (detect frozen sensors)
155+
name="values_vary"
156+
)
157+
158+
**Understanding Assessment Syntax:**
159+
160+
.. list-table::
161+
:header-rows: 1
162+
:widths: 30 70
163+
164+
* - Assessment
165+
- Meaning
166+
* - ``">3"``
167+
- Value must be greater than 3
168+
* - ``"(15.0, 30.0)"``
169+
- Value must be between 15.0 and 30.0 (exclusive)
170+
* - ``"<=35.0"``
171+
- Value must be less than or equal to 35.0
172+
* - ``">=-5.0"``
173+
- Value must be greater than or equal to -5.0
174+
* - ``">1"``
175+
- Value must be greater than 1
176+
177+
Step 4: Run the Monitoring
178+
--------------------------
179+
180+
Now let's start monitoring and see the results:
181+
182+
.. code-block:: python
183+
184+
print("🚀 Starting IoT sensor monitoring...")
185+
print("🌡️ Analyzing temperature data quality...")
186+
187+
# Run the monitoring
188+
results = daq.watch_out(sensor_data)
189+
190+
print("✅ Monitoring complete!")
191+
print("\nQuality assessment results:")
192+
print(results)
193+
194+
Expected output (abbreviated):
195+
196+
.. code-block::
197+
198+
🚀 Starting IoT sensor monitoring...
199+
🌡️ Analyzing temperature data quality...
200+
201+
| sensor_id | window_start | window_end | sufficient_data | avg_temp_normal | no_extreme_high | no_extreme_low | values_vary |
202+
|-----------|---------------------|---------------------|-----------------|-----------------|-----------------|----------------|-------------|
203+
| sensor_01 | 2024-01-15 10:00:00 | 2024-01-15 10:01:00 | (6, True) | (21.8, True) | (24.5, True) | (19.2, True) | (6, True) |
204+
| sensor_02 | 2024-01-15 10:05:00 | 2024-01-15 10:06:00 | (6, True) | (23.1, True) | (23.1, True) | (23.1, True) | (1, False) |
205+
| sensor_03 | 2024-01-15 10:11:00 | 2024-01-15 10:12:00 | (6, True) | (19.4, False) | (45.0, False) | (-10.0, False) | (3, True) |
206+
| sensor_04 | 2024-01-15 10:03:00 | 2024-01-15 10:04:00 | (2, False) | (21

0 commit comments

Comments
 (0)