Skip to content

Commit f98b798

Browse files
authored
docs: build alerts as code (#8588)
* adding alerts * Update alerts.md * Update alerts.md * feedback loop * links * Update alerts.md
1 parent fbe8bf5 commit f98b798

File tree

5 files changed

+310
-2
lines changed

5 files changed

+310
-2
lines changed
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
position: 44
2+
label: Alerts
3+
collapsible: true
4+
collapsed: true

docs/docs/build/alerts/alerts.md

Lines changed: 298 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,298 @@
1+
---
2+
title: Alerts
3+
description: Define alerts as code for automated monitoring and notifications
4+
sidebar_label: Alerts
5+
sidebar_position: 0
6+
---
7+
8+
## Overview
9+
10+
Alerts in Rill allow you to monitor your data and receive notifications when specific conditions are met. While alerts can be created through the UI, defining them as code in YAML files provides version control, reproducibility, and the ability to manage complex alerting logic programmatically.
11+
12+
When you create an alert via a YAML file, it appears in the UI marked as `Created through code`.
13+
14+
:::tip Using live connectors?
15+
16+
If you're using [live connectors](/build/connectors/olap) (ClickHouse, Druid, Pinot, StarRocks, etc.), **alerts are your primary tool for data quality monitoring**. Since live connectors don't create local models, [data quality tests](/build/models/data-quality-tests) won't run. Use alerts instead to validate your data on a schedule.
17+
18+
:::
19+
20+
## Alert Structure
21+
22+
An alert YAML file has the following core components:
23+
24+
```yaml
25+
type: alert
26+
display_name: My Alert Name
27+
description: A brief description of what this alert monitors
28+
29+
# When to check the alert
30+
refresh:
31+
cron: "0 * * * *" # Every hour
32+
33+
# What data to check
34+
data:
35+
sql: SELECT * FROM my_model WHERE condition_is_bad
36+
37+
# Where to send notifications
38+
notify:
39+
email:
40+
recipients:
41+
42+
```
43+
44+
## Scheduling Alerts
45+
46+
The [`refresh`](/reference/project-files/alerts#refresh) property defines when and how often the alert runs.
47+
48+
### Cron Schedule
49+
50+
Use standard `cron` expressions to define the schedule:
51+
52+
```yaml
53+
refresh:
54+
cron: "0 * * * *" # Every hour
55+
time_zone: "America/New_York" # Optional timezone
56+
```
57+
58+
### Interval-Based Monitoring
59+
60+
Use `intervals` when you need to check data across multiple time windows, such as validating metrics for each hour or day. This is useful for time-series monitoring where you want to ensure data quality across a rolling window of time periods. Interval-based monitoring is more flexible than simple cron schedules when you need to check multiple historical periods on each evaluation.
61+
62+
```yaml
63+
refresh:
64+
cron: "5 * * * *" # 5 minutes past each hour
65+
66+
intervals:
67+
duration: PT1H # 1 hour intervals
68+
limit: 24 # Check last 24 intervals
69+
check_unclosed: false
70+
```
71+
72+
## Data Sources
73+
74+
Alerts support multiple data source types to query your data.
75+
76+
### SQL Query
77+
78+
Execute raw SQL against your models:
79+
80+
```yaml
81+
data:
82+
sql: |
83+
SELECT *
84+
FROM orders
85+
WHERE created_at < NOW() - INTERVAL '24 hours'
86+
AND status = 'pending'
87+
```
88+
89+
The alert triggers when the query returns **any rows**.
90+
91+
### Metrics SQL
92+
93+
Use `metrics_sql` when you want to query a [metrics view](/build/metrics-view) using its defined dimensions and measures, rather than writing raw SQL against the underlying model. This approach leverages the metrics view's security policies and allows you to reference measures and dimensions by name. For details on the `metrics_sql` syntax, see [Custom APIs](/build/custom-apis#metrics-sql-api).
94+
95+
```yaml
96+
data:
97+
metrics_sql: |
98+
SELECT *
99+
FROM sales_metrics
100+
WHERE total_revenue < 1000
101+
```
102+
103+
### Custom API
104+
105+
Use a custom API when you want to reuse complex query logic that's already defined as a [Custom API](/build/custom-apis) in your project. This approach is useful for sharing validation logic between alerts and other integrations, or when you need to pass dynamic arguments to your alert queries.
106+
107+
```yaml
108+
data:
109+
api: my_custom_validation_api
110+
args:
111+
threshold: 100
112+
date_range: "7d"
113+
```
114+
115+
### Resource Status
116+
117+
Monitor the health of your Rill resources to catch pipeline failures and reconciliation errors. This is useful for monitoring pipeline health and catching reconciliation failures before they impact downstream processes.
118+
119+
```yaml
120+
data:
121+
resource_status:
122+
where_error: true
123+
```
124+
125+
This triggers when any resource in your project has a reconciliation error.
126+
127+
## Notification Configuration
128+
129+
Configure where and how you receive notifications when alerts trigger. You can send notifications via email, Slack, or both. Notifications are sent when the alert condition is met (when the data query returns rows), and optionally when the alert recovers or encounters evaluation errors.
130+
131+
### Email Notifications
132+
133+
```yaml
134+
notify:
135+
email:
136+
recipients:
137+
138+
139+
140+
```
141+
142+
### Slack Notifications
143+
144+
Before using Slack notifications, you must [configure the Slack integration](/build/connectors/data-source/slack) for your project.
145+
146+
```yaml
147+
notify:
148+
slack:
149+
channels:
150+
- "#data-alerts"
151+
- "#engineering"
152+
users:
153+
- "U1234567890" # Slack user IDs
154+
webhooks:
155+
- "https://hooks.slack.com/services/..."
156+
```
157+
158+
### Combined Notifications
159+
160+
Send to multiple destinations:
161+
162+
```yaml
163+
notify:
164+
email:
165+
recipients:
166+
167+
slack:
168+
channels:
169+
- "#alerts"
170+
```
171+
172+
## Alert Behavior
173+
174+
### Recovery Notifications
175+
176+
Control when you receive notifications about alert state changes. Use `on_recover` to confirm issues are resolved and get peace of mind that problems have been fixed. Use `on_error` to catch alert evaluation failures (e.g., query syntax errors) that prevent the alert from running properly.
177+
178+
```yaml
179+
on_recover: true # Notify when alert recovers
180+
on_fail: true # Notify when alert triggers (default)
181+
on_error: false # Notify on evaluation errors
182+
```
183+
184+
### Re-notification (Snooze)
185+
186+
Control how often you're notified for ongoing issues. This prevents alert fatigue while ensuring ongoing issues aren't forgotten. Instead of receiving notifications on every evaluation cycle, you'll only be re-notified after the specified duration if the alert is still failing.
187+
188+
```yaml
189+
renotify: true
190+
renotify_after: "24h" # Re-notify every 24 hours if still failing
191+
```
192+
193+
## Working Examples
194+
195+
### Data Freshness Alert
196+
197+
This example demonstrates a data freshness check that queries the maximum timestamp from an events model and triggers when data is older than 24 hours. It uses both email and Slack notifications, includes recovery notifications to confirm when data freshness is restored, and implements re-notification every 6 hours to prevent alert fatigue while ensuring ongoing issues are tracked.
198+
199+
```yaml
200+
# alerts/data_freshness.yaml
201+
type: alert
202+
display_name: Data Freshness Check
203+
description: Alert when event data is stale
204+
205+
refresh:
206+
cron: "0 * * * *" # Check every hour
207+
208+
data:
209+
sql: |
210+
SELECT 'Data is stale' AS error_message
211+
FROM (
212+
SELECT MAX(event_timestamp) AS latest_event
213+
FROM events_model
214+
)
215+
WHERE latest_event < NOW() - INTERVAL '24 hours'
216+
217+
notify:
218+
email:
219+
recipients:
220+
221+
slack:
222+
channels:
223+
- "#data-alerts"
224+
225+
on_recover: true
226+
renotify: true
227+
renotify_after: "6h"
228+
```
229+
230+
### Project Health Monitor
231+
232+
This example monitors the overall health of your Rill project by checking for any resource reconciliation errors. It runs every 10 minutes for rapid detection of pipeline failures, uses the `resource_status` data source to automatically detect errors across all resources, and sends notifications to both Slack and email channels. Recovery notifications ensure you're alerted when issues are resolved.
233+
234+
```yaml
235+
# alerts/project_health.yaml
236+
type: alert
237+
display_name: Project Health Monitor
238+
description: Alert when any resource has a reconciliation error
239+
240+
refresh:
241+
cron: "*/10 * * * *" # Every 10 minutes
242+
243+
data:
244+
resource_status:
245+
where_error: true
246+
247+
notify:
248+
slack:
249+
channels:
250+
- "#rill-alerts"
251+
email:
252+
recipients:
253+
254+
255+
on_recover: true
256+
```
257+
258+
### Interval-Based Monitoring Example
259+
260+
This example shows how to use interval-based monitoring to validate metrics across multiple time periods. It checks hourly aggregates for the last 24 hours, looking for any hours with zero event counts. The alert runs 5 minutes past each hour to ensure the previous hour's data is complete, and uses the `intervals` configuration to systematically check each hour in the rolling window. This pattern is ideal for time-series data quality monitoring where you need to validate multiple periods on each evaluation.
261+
262+
```yaml
263+
# alerts/hourly_metrics.yaml
264+
type: alert
265+
display_name: Hourly Metrics Check
266+
description: Validate metrics for each hour
267+
268+
refresh:
269+
cron: "5 * * * *" # 5 minutes past each hour
270+
271+
intervals:
272+
duration: PT1H # 1 hour intervals
273+
limit: 24 # Check last 24 intervals
274+
check_unclosed: false
275+
276+
data:
277+
sql: |
278+
SELECT *
279+
FROM hourly_aggregates
280+
WHERE hour_start = DATE_TRUNC('hour', NOW() - INTERVAL '1 hour')
281+
AND event_count = 0
282+
283+
notify:
284+
slack:
285+
channels:
286+
- "#monitoring"
287+
```
288+
289+
## Reference
290+
291+
For the complete specification of all available properties, see the [Alert YAML Reference](/reference/project-files/alerts).
292+
293+
:::note Advanced Properties
294+
295+
For advanced properties like `glob`, `for`, `watermark`, and `timeout`, see the [Alert YAML Reference](/reference/project-files/alerts).
296+
297+
:::
298+

docs/docs/build/models/data-quality-tests.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,12 @@ Tests are defined in your model's YAML file using the `tests:` property. Each te
1212

1313
## When to Use Data Quality Tests
1414

15+
:::tip Using live connectors? Use alerts instead
16+
17+
Data quality tests run when models refresh, which means they only work with models that Rill manages. If you're using [live connectors](/build/connectors/olap) (ClickHouse, Druid, Pinot, StarRocks, etc.) where data lives in external systems, use [alerts](/build/alerts) to monitor data quality on a schedule instead.
18+
19+
:::
20+
1521
Data quality tests are useful for:
1622

1723
- **Data Quality Checks** - Verify that your data meets business rules and constraints

docs/docs/reference/project-files/alerts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,7 @@ _[object]_ - Notification configuration _(required)_
142142

143143
- **`users`** - _[array of string]_ - An array of Slack user IDs to notify.
144144

145-
- **`channels`** - _[array of string]_ - An array of Slack channel IDs to notify.
145+
- **`channels`** - _[array of string]_ - An array of Slack channel names to notify.
146146

147147
- **`webhooks`** - _[array of string]_ - An array of Slack webhook URLs to send notifications to.
148148

runtime/parser/schema/project.schema.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3263,7 +3263,7 @@ definitions:
32633263
minItems: 1
32643264
channels:
32653265
type: array
3266-
description: An array of Slack channel IDs to notify.
3266+
description: An array of Slack channel names to notify.
32673267
items:
32683268
type: string
32693269
minItems: 1

0 commit comments

Comments
 (0)