What happened:
I connected clickhouse that contains Terrabytes of logs which has JSON column.
When i used grafana, clickhouse cpu was heavily used.
So when i looked at the clickhouse's slow query log, i noticed grafana is requesting this query.
SELECT arrayJoin(distinctJSONPathsAndTypes(LogAttributes)) FROM "otel"."otel_logs" WHERE SETTINGS max_execution_time=10
This query scans all rows of otel_logs table. (for my case, it timed out because of 10s of max_execution_time)
This is problem because it uses cpu of clickhouse heavily.
What you expected to happen:
The following code determines the available JSON paths:
|
async fetchPathsForJSONColumns( |
|
database: string | undefined, |
|
table: string, |
|
jsonColumnName: string |
|
): Promise<TableColumn[]> { |
|
const prefix = Boolean(database) ? `"${database}".` : ''; |
|
const rawSql = `SELECT arrayJoin(distinctJSONPathsAndTypes(${jsonColumnName})) FROM ${prefix}"${table}" SETTINGS max_execution_time=10`; |
|
const frame = await this.runQuery({ rawSql }); |
|
if (frame.fields?.length === 0) { |
|
return []; |
I think the plugin should provide a setting that allows users to configure how JSON paths are retrieved. We need a discussion about the best way to make this behavior configurable.
Here are some possible options I’ve considered:
- Add a configurable LIMIT
SELECT arrayJoin(distinctJSONPathsAndTypes(LogAttributes)) FROM (
SELECT LogAttributes from "otel"."otel_logs" limit
1000 -- configurable limit
) SETTINGS max_execution_time=10
- Add a configurable WHERE condition
SELECT arrayJoin(distinctJSONPathsAndTypes(LogAttributes)) FROM (
SELECT LogAttributes from "otel"."otel_logs" where
Timestamp > now() - INTERVAL 1 hour -- configurable where statement
) SETTINGS max_execution_time=10
This is more flexible, but more complicated since the timestamp column name may differ between tables.
- Simple on/off
If disabled, the plugin won’t query for JSON paths at all.
Environment:
- Grafana version: 12.2.1
- Plugin version: 4.11.2
- OS Grafana is installed on:
- User OS & Browser:
- Others:
What happened:
I connected clickhouse that contains Terrabytes of logs which has JSON column.
When i used grafana, clickhouse cpu was heavily used.
So when i looked at the clickhouse's slow query log, i noticed grafana is requesting this query.
This query scans all rows of otel_logs table. (for my case, it timed out because of 10s of max_execution_time)
This is problem because it uses cpu of clickhouse heavily.
What you expected to happen:
The following code determines the available JSON paths:
clickhouse-datasource/src/data/CHDatasource.ts
Lines 591 to 600 in 3e357e9
I think the plugin should provide a setting that allows users to configure how JSON paths are retrieved. We need a discussion about the best way to make this behavior configurable.
Here are some possible options I’ve considered:
This is more flexible, but more complicated since the timestamp column name may differ between tables.
If disabled, the plugin won’t query for JSON paths at all.
Environment: