You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/use-cases/observability/clickstack/alerts.md
+156-1Lines changed: 156 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,6 +20,8 @@ import add_webhook_dialog from '@site/static/images/use-cases/observability/add_
20
20
import manage_alerts from '@site/static/images/use-cases/observability/manage_alerts.png';
21
21
import alerts_view from '@site/static/images/use-cases/observability/alerts_view.png';
22
22
import multiple_search_alerts from '@site/static/images/use-cases/observability/multiple_search_alerts.png';
23
+
import add_raw_sql_alert from '@site/static/images/use-cases/observability/add_raw_sql_alert.png';
24
+
import open_sql_chart_mode from '@site/static/images/use-cases/observability/open_sql_chart_mode.png';
23
25
import remove_chart_alert from '@site/static/images/use-cases/observability/remove_chart_alert.png';
24
26
import Tabs from '@theme/Tabs';
25
27
import TabItem from '@theme/TabItem';
@@ -103,7 +105,7 @@ Select **Add Alert**.
103
105
104
106
#### Define the alert conditions {#define-alert-conditions}
105
107
106
-
Define the condition (`>=`, `<`), threshold, duration, and webhook. The duration here will also dictate how often the alert is triggered.
108
+
Define the condition (`>=`, `>`, `<=`, `<`, `=`, `!=`, `<= x >=`, `> or <`), threshold, duration, and webhook. The duration here will also dictate how often the alert is triggered.
107
109
108
110
<Imageimg={create_chart_alert}alt="Create alert for chart"size="lg"/>
109
111
@@ -189,6 +191,159 @@ In the example below, the `Remove Alert` button will remove the alert from the c
SQL-based chart alerts let you write arbitrary ClickHouse SQL to define alert conditions. This gives you full control over filtering, aggregation, and math — anything you can express in SQL can become an alert.
SQL-based alerts are supported on three chart display types:
201
+
202
+
| Chart type | Behavior |
203
+
|---|---|
204
+
|**Line**| Time-series alert. The query must produce time-bucketed rows. Each bucket is evaluated independently against the threshold. |
205
+
|**Stacked Bar**| Time-series alert. Same behavior as Line. |
206
+
|**Number**| Single-value alert. The query returns a single numeric result which is compared against the threshold once per evaluation. |
207
+
208
+
Other SQL-based chart types (Table, Pie, Heatmap, etc.) do not support alerts.
209
+
210
+
### Creating a SQL alert {#create-sql-based-alert}
211
+
212
+
To create an alert on a SQL-based chart:
213
+
214
+
<VerticalStepperheaderLevel="h4">
215
+
216
+
#### Create or open a SQL-based chart on a dashboard {#open-sql-chart}
217
+
218
+
From a saved dashboard, either [create a new chart with the **SQL** chart mode](./dashboards/sql-visualizations.md), or open an existing SQL-based chart for editing.
219
+
220
+
Choose **Line**, **Stacked Bar**, or **Number** as the display type.
Select **Add Alert** from the alert section of the chart editor. Configure:
227
+
228
+
-**Threshold type**: `>=` (greater than or equal), `>` (greater than), `<=` (less than or equal), `<` (less than), `=` (equal), `!=` (not equal), `<= x >=` (between), or `> or <` (outside)
229
+
-**Threshold value**: The numeric value to compare against
230
+
-**Interval**: How often the alert is evaluated (1m, 5m, 15m, 30m, 1h, 6h, 12h, or 1d). This also defines the time window for each evaluation.
231
+
-**Webhook**: The notification channel to use when the alert fires. See [Adding a webhook](#add-webhook).
Typically, alert queries are executed once per interval. However, if one or more intervals are skipped due to errors or slow queries, the following execution will use a time range that includes the missed intervals. In this case, the query's interval parameters would still be set to the alert's configured period, but the time range parameters would reflect the longer time range.
237
+
:::
238
+
239
+
#### Save the dashboard {#save-sql-dashboard}
240
+
241
+
Save the dashboard to activate the alert. The alert will begin evaluating on the configured interval.
242
+
243
+
</VerticalStepper>
244
+
245
+
### How query results are interpreted {#sql-result-interpretation}
246
+
247
+
The alert system inspects the columns returned by your SQL query to determine what to compare against the threshold.
248
+
249
+
-**Value column**: The **last numeric column** in your `SELECT` clause is used as the alert value. If your query returns multiple numeric columns (e.g., `count, avg_latency, p99_latency`), only the last one (`p99_latency`) is compared to the threshold.
250
+
-**Timestamp column**: For time-series charts (Line and Stacked Bar), the system identifies the Date/DateTime column in your results as the time bucket (i.e. the x-axis on a time-series chart). The value column for each time bucket is evaluated against the threshold independently, and if the value for any time bucket breaches the configured threshold, the alert will trigger.
251
+
-**Group columns**: Any non-numeric, non-timestamp columns (e.g., `ServiceName`, `Environment`) are treated as grouping dimensions. When groups are present, each unique combination of group values is tracked and alerted on separately. ClickStack will send an alert for each group with a value that breaches the configured threshold. Groups are only available for time-series charts.
252
+
253
+
### Query parameters and macros {#query-params}
254
+
255
+
SQL alert queries support template parameters and macros that are automatically replaced at evaluation time. These are the same parameters and macros available when [building a SQL-based chart](./dashboards/sql-visualizations.md).
256
+
257
+
#### Required and Recommended Parameters {#required-alert-parameters}
258
+
259
+
Queries used for line or stacked bar chart alerts **must** include an interval parameter or macro (`{intervalSeconds:Int64}`, `{intervalMilliseconds:Int64}`, `$__timeInterval(col)`, or `$__timeInterval_ms(col)`). During alert execution, it will be replaced with the alert's configured period.
260
+
261
+
Queries used for alerts **should** include a time range filter (`{startDateMilliseconds:Int64}` and `{endDateMilliseconds:Int64}`, or `$__timeFilter(col)`, etc.). Regardless of whether a time range filter is present in the query, the alert query will run on the alert's configured period. If there is no time range filter, then the query will read the entire time range available in the source table during each execution.
262
+
263
+
:::warning Alert Time Range
264
+
Typically, alert queries are executed once per interval. However, if one or more intervals are skipped due to errors or slow queries, the following execution will use a time range that includes the missed intervals. In this case, the query's interval parameters would still be set to the alert's configured period, but the time range parameters would reflect the longer time range.
265
+
:::
266
+
267
+
### Example alert queries {#example-queries}
268
+
269
+
#### Error rate per service (time-series) {#example-error-rate}
270
+
271
+
Alert when any service has an error rate above 5%, with at least 10 requests in the alert period to avoid noisy alerts on low-traffic services.
272
+
273
+
```sql
274
+
WITH error_rates AS (
275
+
SELECT
276
+
$__timeInterval(Timestamp) as ts,
277
+
ServiceName,
278
+
countIf (SpanKind ='Server') as request_count,
279
+
countIf (
280
+
SpanKind ='Server'
281
+
and StatusCode ='Error'
282
+
) as error_count,
283
+
error_count / request_count *100AS error_percent
284
+
FROM $__sourceTable
285
+
WHERE $__timeFilter(Timestamp)
286
+
GROUP BY ts, ServiceName
287
+
)
288
+
SELECT ts, ServiceName, error_percent
289
+
FROM error_rates
290
+
WHERE request_count >10
291
+
```
292
+
293
+
**Display type**: Line or Stacked Bar
294
+
**Threshold**: `>= 5` (fires when error rate reaches 5%)
295
+
296
+
In this query, `ServiceName` is a non-numeric, non-timestamp column, so each service is tracked as a separate alert group. The alert fires independently per service.
297
+
298
+
#### Anomaly detection with lagging average (time-series) {#example-anomaly-detection}
299
+
300
+
Alert on excess error counts that exceed a rolling average by more than two standard deviations. This catches spikes relative to recent baseline behavior rather than a fixed threshold.
301
+
302
+
```sql
303
+
WITH buckets AS (
304
+
SELECT
305
+
$__timeInterval(Timestamp) AS ts,
306
+
count() AS bucket_count
307
+
FROM $__sourceTable
308
+
WHERE TimestampTime >= fromUnixTimestamp64Milli({startDateMilliseconds:Int64})
309
+
- toIntervalSecond($__interval_s *30) -- Fetch 30 intervals back
310
+
AND TimestampTime < fromUnixTimestamp64Milli({endDateMilliseconds:Int64})
311
+
AND SeverityText ='error'
312
+
GROUP BY ts
313
+
ORDER BY ts
314
+
WITH FILL
315
+
FROM toDateTime(fromUnixTimestamp64Milli({startDateMilliseconds:Int64}))
316
+
TO toDateTime(fromUnixTimestamp64Milli({endDateMilliseconds:Int64}))
317
+
STEP toIntervalSecond($__interval_s)
318
+
),
319
+
320
+
anomaly_detection AS (
321
+
SELECT
322
+
ts,
323
+
bucket_count,
324
+
avg(bucket_count) OVER (
325
+
ORDER BY ts ROWS BETWEEN 30 PRECEDING AND1 PRECEDING
326
+
) AS previous_30_avg,
327
+
stddevPop(bucket_count) OVER (
328
+
ORDER BY ts ROWS BETWEEN 30 PRECEDING AND1 PRECEDING
WHERE ts >= fromUnixTimestamp64Milli({startDateMilliseconds:Int64})
339
+
AND ts < fromUnixTimestamp64Milli({endDateMilliseconds:Int64})
340
+
```
341
+
342
+
**Display type**: Line
343
+
**Threshold**: `> 0` (fires when excess errors above the rolling baseline are detected)
344
+
345
+
Note that the query fetches 30 intervals *before* the start of the date range to seed the rolling window calculations, then filters the final output to only the evaluation window.
346
+
192
347
## Common alert scenarios {#common-alert-scenarios}
193
348
194
349
Here are a few common alert scenarios you can use HyperDX for:
0 commit comments