Replace def _limit_clause_sql with sqlglot #2689

anuunchin · 2025-05-27T19:34:27Z

Description

This PR replaces the _limit_clause_sql function with sqlglot. Also aims to adjust the following functions involving manual query building:

def row_counts() in dlt/destinations/dataset/dataset.py
def query() in dlt/destinations/dataset/relation.py
def get_stored_schema() in dlt/destinations/job_client_impl.py
def get_stored_schema_by_hash() in dlt/destinations/job_client_impl.py
def get_stored_state() in dlt/destinations/job_client_impl.py

netlify · 2025-05-27T19:34:31Z

✅ Deploy Preview for dlt-hub-docs ready!

Name	Link
🔨 Latest commit	`dd22801`
🔍 Latest deploy log	https://app.netlify.com/projects/dlt-hub-docs/deploys/683ec5202af7380008366869
😎 Deploy Preview	https://deploy-preview-2689--dlt-hub-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

sh-rp · 2025-05-28T09:32:10Z

Just a note: things that could additionally be replaced is the sql code in "row_counts" and where we get state and schema from the destination.

sh-rp

Thanks for the work on this, it's a good first start :) There has been new work on the transformations that makes this much easier. I also think the ultimate goal of this work is to remove as much handwritten sql from our codebase as possible (we have a lot of it in the sql_jobs file which handles merging, replacing and appending), but this is for future tickets.

Firstly I think you have to wait just a little longer until we have the upcoming transformations PR merged, as there are some more change in there. After that you can do the following:

Have a look at the def normalize_query function that @rudolfix wrote. This will create a query with identifiers in the "dlt schema" space into a query with the correct fully qualified identifiers of the destination. It needs the sql_client and the sql_glot schema for this which you have in all the places where you are writing these queries. You can also look at this on the transformations decoupling branch which is more up to date.
Update your queries to not add catalogs, databases or any aliases but rather run them through this normalize function. They should become much easier and all of the destination specific code should go away (clickhouse)
I think we could add a dedicated file at dlt/destinations/queries.py where all these queries live and can be unit tested independently of any actual loading happening! The "normalize_query" function can go there too imho.
Add unit tests for the queries. We can talk about what exactly needs to be tested here.

sh-rp · 2025-06-06T07:07:25Z

dlt/destinations/impl/mssql/sql_client.py

@@ -187,6 +187,3 @@ def _make_database_exception(cls, ex: Exception) -> Exception:
    @staticmethod
    def is_dbapi_exception(ex: Exception) -> bool:
        return isinstance(ex, pyodbc.Error)
-
-    def _limit_clause_sql(self, limit: int) -> Tuple[str, str]:


Good riddance :)

sh-rp · 2025-06-06T07:09:19Z

dlt/destinations/job_client_impl.py

+        dataset_name = self.sql_client.dataset_name
+        catalog_name = self.sql_client.catalog_name(escape=False)
+
+        if self.config.destination_type == "clickhouse":


Just on a general note, the destination specific implementations should never leak into the base class. The right way to do it would be to have the correct implementation in the subclass. But with the changes I propose below, this problem should be gone in any case.

sh-rp · 2025-06-06T07:09:49Z

dlt/destinations/job_client_impl.py

+
+        table_expr = sqlglot.exp.Table(
+            this=sqlglot.exp.to_identifier(table_name, quoted=True),
+            db=sqlglot.exp.to_identifier(dataset_name, quoted=True),


If we use this new function to normalize queries (explained above), we will not have to add db and catalog everywhere.

sh-rp · 2025-06-06T07:11:31Z

dlt/destinations/dataset/relation.py


-        table_name = self.sql_client.make_qualified_table_name(
-            dataset_schema.naming.normalize_path(self._table_name)
+        table_name = dataset_schema.naming.normalize_tables_path(self._table_name)


There is a change now on devel that runs the result of the query method through a qualifying and normalization step, have a look at what is there now, and you will see that you will not have to do any destination specific stuff or add database and catalog information here.

anuunchin self-assigned this May 28, 2025

anuunchin force-pushed the feat/sqlglot-query-readablerelation branch 5 times, most recently from b341c18 to 3fe5923 Compare June 3, 2025 09:16

anuunchin marked this pull request as ready for review June 3, 2025 09:20

anuunchin added 2 commits June 3, 2025 11:48

limit clause func removed from relation query

d28dc18

Row counts, get stored schema

dd22801

anuunchin force-pushed the feat/sqlglot-query-readablerelation branch from 3fe5923 to dd22801 Compare June 3, 2025 09:49

anuunchin requested review from sh-rp and removed request for sh-rp June 3, 2025 09:50

sh-rp requested changes Jun 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace def _limit_clause_sql with sqlglot #2689

Replace def _limit_clause_sql with sqlglot #2689

Uh oh!

anuunchin commented May 27, 2025 •

edited

Loading

Uh oh!

netlify bot commented May 27, 2025 •

edited

Loading

Uh oh!

sh-rp commented May 28, 2025 •

edited

Loading

Uh oh!

sh-rp left a comment

Uh oh!

sh-rp Jun 6, 2025

Uh oh!

sh-rp Jun 6, 2025

Uh oh!

anuunchin Jun 6, 2025

Uh oh!

sh-rp Jun 6, 2025 •

edited

Loading

Uh oh!

sh-rp Jun 6, 2025

Uh oh!

Uh oh!

Replace def _limit_clause_sql with sqlglot #2689

Are you sure you want to change the base?

Replace def _limit_clause_sql with sqlglot #2689

Uh oh!

Conversation

anuunchin commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

netlify bot commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for dlt-hub-docs ready!

Uh oh!

sh-rp commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sh-rp left a comment

Choose a reason for hiding this comment

Uh oh!

sh-rp Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

sh-rp Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

anuunchin Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

sh-rp Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sh-rp Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

anuunchin commented May 27, 2025 •

edited

Loading

netlify bot commented May 27, 2025 •

edited

Loading

sh-rp commented May 28, 2025 •

edited

Loading

sh-rp Jun 6, 2025 •

edited

Loading