You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running dbt on top of AWS Aurora Postgres and I am running into all sorts of performance issues. One that could be easily fixed is the query for the unique tests.
The current query uses an unnecessary "with" that causes (at least) Postgrest to create a temp table with all data as part of the query. And any indexes existing on the original table can't be utilized.
The with block could be removed and a simpler subquery used instead which would eliminate the need for this temp table and increase performance
Current query:
select count(*) as failures, count(*) != 0 as should_warn, count(*) != 0 as should_error from ( with validation_errors as ( select <unique column>, count(*) as row_count from <table> group by <unique column> having count(*) > 1 or <unique column> is null ) select * from validation_errors ) dbt_internal_test
This can be rewritten to
select count(*) as failures, count(*) != 0 as should_warn, count(*) != 0 as should_error from ( select <unique column>, count(*) as row_count from <table> group by <unique column> having count(*) > 1 or <unique column> is null )
which would eliminate creating the temp table, and enable using any existing indexes on the original table
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am running dbt on top of AWS Aurora Postgres and I am running into all sorts of performance issues. One that could be easily fixed is the query for the unique tests.
The current query uses an unnecessary "with" that causes (at least) Postgrest to create a temp table with all data as part of the query. And any indexes existing on the original table can't be utilized.
The with block could be removed and a simpler subquery used instead which would eliminate the need for this temp table and increase performance
Current query:
select count(*) as failures, count(*) != 0 as should_warn, count(*) != 0 as should_error from ( with validation_errors as ( select <unique column>, count(*) as row_count from <table> group by <unique column> having count(*) > 1 or <unique column> is null ) select * from validation_errors ) dbt_internal_test
This can be rewritten to
select count(*) as failures, count(*) != 0 as should_warn, count(*) != 0 as should_error from ( select <unique column>, count(*) as row_count from <table> group by <unique column> having count(*) > 1 or <unique column> is null )
which would eliminate creating the temp table, and enable using any existing indexes on the original table
Beta Was this translation helpful? Give feedback.
All reactions