Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 60 additions & 6 deletions scripts/optimization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -336,14 +336,13 @@ and the pricing for each region found [here](https://cloud.google.com/bigquery/p

## Queries grouped by hash

The [queries_grouped_by_hash.sql](queries_grouped_by_hash.sql) script creates a
The [queries_grouped_by_hash_project.sql](queries_grouped_by_hash_project.sql) script creates a
table named,
`queries_grouped_by_hash`. This table groups queries by their normalized query
`queries_grouped_by_hash_project`. This table groups queries by their normalized query
pattern, which ignores
comments, parameter values, UDFs, and literals in the query text.
This allows us to group queries that are logically the same, but
have different literals. The `queries_grouped_by_hash` table does not expose the
raw SQL text of the queries.
have different literals.

The [viewable_queries_grouped_by_hash.sql](viewable_queries_grouped_by_hash.sql)
script creates a table named,
Expand All @@ -355,6 +354,11 @@ in execution than the `queries_grouped_by_hash.sql` script because it has to
loop over all projects and for each
project query the `INFORMATION_SCHEMA.JOBS_BY_PROJECT` view.

Both the `queries_grouped_by_hash` (Org and Project level) tables include duration percentiles (`median_time_ms`, `p75_time_ms`, `p90_time_ms`, etc.) calculated from `creation_time`. These metrics help identify query performance stability:
- **Median**: If median is high, it indicates that the query is taking a long time to complete. Prioritize optimizing queries with high median duration. (filter earlier, check joins).
- **Median vs p99**: A large gap indicates unstable performance (e.g., occasional slot contention or data skew).
- **p95/p99**: Useful for tracking SLA violations and "worst-case" user experience.

For example, the following queries would be grouped together because the date
literal filters are ignored:

Expand All @@ -372,16 +376,25 @@ Running the `run_anti_pattern_tool.sh` bash script will build and run the Anti-P

```sql
SELECT *
FROM optimization_workshop.queries_grouped_by_hash
FROM optimization_workshop.queries_grouped_by_hash_org
ORDER BY total_gigabytes_processed DESC
LIMIT 100
```

* Top 200 queries with the highest total slot hours

```sql
SELECT *
FROM optimization_workshop.queries_grouped_by_hash_project
ORDER BY total_slot_hours DESC
LIMIT 200
```

* Top 100 recurring queries with the highest slot hours consumed

```sql
SELECT *
FROM optimization_workshop.queries_grouped_by_hash
FROM optimization_workshop.queries_grouped_by_hash_org
ORDER BY total_slot_hours * days_active * job_count DESC
LIMIT 100
```
Expand Down Expand Up @@ -487,6 +500,45 @@ generated for them in the past 30 days.

</details>

<details><summary><b>&#128269; BI Engine Mode Duration </b></summary>

## BI Engine Mode Duration

The [bi_engine_mode_duration](bi_engine_mode_duration.sql)
script creates a table named, `bi_engine_mode_duration`. This table
groups queries by their BI Engine mode and then shows for every day timeslice how long queries took for each mode.

### Examples of querying script results

* Order by day and BI Engine mode

```sql
SELECT *
FROM optimization_workshop.bi_engine_mode_duration
ORDER BY day, bi_engine_mode ASC
```

</details>

<details><summary><b>&#128269; BI Engine Disabled Reasons</b></summary>

## BI Engine Disabled Reasons

The [bi_engine_disabled_reasons](bi_engine_disabled_reasons.sql)
script creates a table named, `bi_engine_disabled_reasons`. This table groups queries by their BI Engine Disabled reason and counts them by reason.

### Examples of querying script results

* Order by reasons count descending

```sql
SELECT *
FROM optimization_workshop.bi_engine_disabled_reasons
ORDER BY count DESC
```

</details>

# Workload Analysis

<details><summary><b>&#128269; Hourly slot consumption by query hash</b></summary>
Expand Down Expand Up @@ -534,3 +586,5 @@ of that hour's slots each grouping of labels consumed.
```

</details>


Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright 2023 Google LLC
* Copyright 2026 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down
36 changes: 36 additions & 0 deletions scripts/optimization/bi_engine_disabled_reasons.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
/*
* Copyright 2026 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

/*
* This script creates a table named, bi_engine_disabled_reasons,
* which groups queries by their BI Engine Disabled reason and counts them by reason.
* This table helps identify the most common reasons why BI Engine is disabled
* for your queries so that you can tune your queries to be BI Engine friendly.
*
* 30 days is the default timeframe, but you can change this by setting the
* num_days_to_scan variable to a different value.
*/


DECLARE num_days_to_scan INT64 DEFAULT 30;

CREATE SCHEMA IF NOT EXISTS optimization_workshop;
CREATE OR REPLACE TABLE optimization_workshop.bi_engine_disabled_reasons AS
SELECT reasons.code, count(*)
FROM `region-us`.INFORMATION_SCHEMA.JOBS as jbo, UNNEST(bi_engine_statistics.bi_engine_reasons) AS reasons
WHERE DATE(jbo.creation_time) >= CURRENT_DATE - num_days_to_scan
AND bi_engine_statistics.bi_engine_mode = 'DISABLED'
GROUP BY reasons.code;
62 changes: 62 additions & 0 deletions scripts/optimization/bi_engine_mode_duration.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
/*
* Copyright 2026 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

DECLARE num_days_to_scan INT64 DEFAULT 30;

CREATE SCHEMA IF NOT EXISTS optimization_workshop;
CREATE OR REPLACE TABLE optimization_workshop.bi_engine_mode_duration AS
SELECT
day,
bi_engine_mode,
COUNT(*) as job_count,
AVG(time_ms) avg_time_ms,
MAX(median_time_ms) median_time_ms,
MAX(p75_time_ms) p75_time_ms,
MAX(p80_time_ms) p80_time_ms,
MAX(p90_time_ms) p90_time_ms,
MAX(p95_time_ms) p95_time_ms,
MAX(p99_time_ms) p99_time_ms,
FROM
(
SELECT
day,
bi_engine_mode,
time_ms,
PERCENTILE_CONT(time_ms, 0.5) OVER (PARTITION BY day, bi_engine_mode) as median_time_ms,
PERCENTILE_CONT(time_ms, 0.75) OVER (PARTITION BY day, bi_engine_mode) as p75_time_ms,
PERCENTILE_CONT(time_ms, 0.8) OVER (PARTITION BY day, bi_engine_mode) as p80_time_ms,
PERCENTILE_CONT(time_ms, 0.90) OVER (PARTITION BY day, bi_engine_mode) as p90_time_ms,
PERCENTILE_CONT(time_ms, 0.95) OVER (PARTITION BY day, bi_engine_mode) as p95_time_ms,
PERCENTILE_CONT(time_ms, 0.99) OVER (PARTITION BY day, bi_engine_mode) as p99_time_ms,
FROM
(
SELECT
DATE(jbo.creation_time) AS day,
bi_engine_statistics.bi_engine_mode as bi_engine_mode,
job_id,
TIMESTAMP_DIFF(jbo.end_time, jbo.creation_time, MILLISECOND) time_ms
FROM
FROM `region-us`.INFORMATION_SCHEMA.JOBS jbo
WHERE
DATE(creation_time) >= CURRENT_DATE - num_days_to_scan
AND jbo.end_time > jbo.start_time
AND jbo.error_result IS NULL
AND jbo.statement_type != 'SCRIPT'
)
)
GROUP BY
1,
2
2 changes: 1 addition & 1 deletion scripts/optimization/daily_project_analysis.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright 2023 Google LLC
* Copyright 2026 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion scripts/optimization/daily_reservation_analysis.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright 2023 Google LLC
* Copyright 2026 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion scripts/optimization/enable_recommender_api.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

# Copyright 2023 Google LLC
# Copyright 2026 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion scripts/optimization/frequent_daily_table_dml.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright 2023 Google LLC
* Copyright 2026 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down
16 changes: 16 additions & 0 deletions scripts/optimization/hourly_slot_consumption_by_labels.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
/*
* Copyright 2026 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

DECLARE num_days_to_scan INT64 DEFAULT 30;
DECLARE my_reservation_id STRING DEFAULT "your_reservation_id";

Expand Down
16 changes: 16 additions & 0 deletions scripts/optimization/hourly_slot_consumption_by_query_hash.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
/*
* Copyright 2026 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

DECLARE num_days_to_scan INT64 DEFAULT 30;
DECLARE my_reservation_id STRING DEFAULT "your_reservation_id";

Expand Down
12 changes: 9 additions & 3 deletions scripts/optimization/queries_grouped_by_hash_org.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright 2023 Google LLC
* Copyright 2026 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -15,8 +15,8 @@
*/

/*
* This script creates a table named, top_bytes_scanning_queries_by_hash,
* which contains the top 200 most expensive queries by total bytes scanned
* This script creates a table named, queries_grouped_by_hash_org,
* which contains the top 200 most expensive queries by total slot hours
* within the past 30 days.
* 30 days is the default timeframe, but you can change this by setting the
* num_days_to_scan variable to a different value.
Expand Down Expand Up @@ -74,6 +74,12 @@ SELECT
|| '.' || ref_table.table_id
FROM UNNEST(referenced_tables) ref_table
)) AS referenced_tables,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(50)] AS median_time_ms,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(75)] AS p75_time_ms,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(80)] AS p80_time_ms,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(90)] AS p90_time_ms,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(95)] AS p95_time_ms,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(99)] AS p99_time_ms
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION
WHERE
DATE(creation_time) >= CURRENT_DATE - num_days_to_scan
Expand Down
17 changes: 12 additions & 5 deletions scripts/optimization/queries_grouped_by_hash_project.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright 2023 Google LLC
* Copyright 2026 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -15,8 +15,8 @@
*/

/*
* This script creates a table named, top_bytes_scanning_queries_by_hash,
* which contains the top 200 most expensive queries by total bytes scanned
* This script creates a table named queries_grouped_by_hash_project,
* which contains the top 200 most expensive queries by total slot hours
* within the past 30 days.
* 30 days is the default timeframe, but you can change this by setting the
* num_days_to_scan variable to a different value.
Expand Down Expand Up @@ -75,11 +75,18 @@ SELECT
|| '.' || ref_table.table_id
FROM UNNEST(referenced_tables) ref_table
)) AS referenced_tables,
FROM `region-us`.INFORMATION_SCHEMA.JOBS
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(50)] AS median_time_ms,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(75)] AS p75_time_ms,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(80)] AS p80_time_ms,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(90)] AS p90_time_ms,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(95)] AS p95_time_ms,
APPROX_QUANTILES(TIMESTAMP_DIFF(end_time, creation_time, MILLISECOND), 100)[OFFSET(99)] AS p99_time_ms
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION
WHERE
DATE(creation_time) >= CURRENT_DATE - num_days_to_scan
AND state = 'DONE'
AND error_result IS NULL
AND job_type = 'QUERY'
AND statement_type != 'SCRIPT'
GROUP BY statement_type, query_hash;
GROUP BY statement_type, query_hash

17 changes: 17 additions & 0 deletions scripts/optimization/queries_grouped_by_labels.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,20 @@
/*
* Copyright 2026 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/


DECLARE num_days_to_scan INT64 DEFAULT 30;

CREATE TEMP FUNCTION num_stages_with_perf_insights(query_info ANY TYPE) AS (
Expand Down
2 changes: 1 addition & 1 deletion scripts/optimization/queries_grouped_by_script.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright 2023 Google LLC
* Copyright 2026 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion scripts/optimization/query_performance_insights.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright 2023 Google LLC
* Copyright 2026 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down
Loading
Loading