-
Notifications
You must be signed in to change notification settings - Fork 287
feat: incremental @dlt.transformation
#2716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for dlt-hub-docs canceled.
|
this is doing something that I expected we'll do as a next step - to enable Incremental models:I think our (or mine) approach is right now incorrect. Incremental is a property of the model, not on the input tables. So we need to implement incremental on top of the user query. we do not need incremental/non-incremental tables or relations. this is what sqlmesh does: https://sqlmesh.readthedocs.io/en/stable/examples/incremental_time_full_walkthrough/#setup, the concept code: # customers table has `_dlt_load_id` column
incremental = dlt.sources.incremental("_dlt_load_id")
# I assume that query() returns sqlglot query (this is concept code)
sqlglot_query = dataset.query("SELECT customers.*, SUM(orders.amount) FROM customers AS c JOIN orders AS o ON c.id = o.order_id GROUP BY ").query()
incremental_sqlglot_query = sqlglot_query.filter(incremental.to_sqlglot_filter()) here we just apply filter to the results of user's query to make it incremental. "_dlt_load_id` must be present in top level SELECT obviously and query optimizer (SQLglot has a good one) will push it down to the customers table.
so it seems we do not need to extend dataset/relation etc. for that to work. NOTE:
Updating incremental stateYour code is very close to what we need, but we have a dedicated place for that: we need to implement incremental for the SqlModel: this is what we have now (in
So we implement and in there (call method)
sqlmesh is always passing a time range to the incremental models so they do not need to compute max (corresponds to setting explicit Incremental on _dlt_load_id:I think I wrote how to do that without any race conditions etc. If we want to use
Coming back to incremental on the particular table or subqueryI need to think about it. IMO "SqlMesh" way is pretty limiting and be able to apply incremental on any subquery within the model and still be able to track the ranges would be prett cool |
Description
This is a work in progress.
Follows iterations: #2612 #2386