Replies: 1 comment 1 reply
-
|
I discussed this idea at some length with @roeap in our weekly Delta Lake Live stream. I think this is a good idea, and would make our operations a lot more visible and consistent. @roeap compared the let mut dt = open_table(&some_url).await?;
dt = dt.optimize().with_superpowers().etc().await?;
dt = dt.vacuum().await?;I think the more we want to push with |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
While migrating our internals to using delta-kernel-rs, we saw that there might be some room for improvement in how we structure our main structs/apis. Certain feature implementations we struggle with IMO steam from the fact, that there is some conceptual misalignment.
Leaving aside the two separate snapshots for now, I believe
DeltaTableandDeltaOpscould or even should be consolidated intoDeltaTable. The main reason being that we have gotten into a place where we practically are using sessions without acknowledging it (at least not always). In the abstract form the session is responsible for resources required to execute a query. For datafusion based operations we create df sessions ad-hoc, while in other casesLogStoreis effectively the session.DeltaOpsjust wraps a pubDeltaTableto create the builders. When consolidated this would reduce the nimber of exposed stats and hopefully improve discoverability for operations. We can then also more flexibly handle session management - i.e. allowing users to provide a datafusion session we can use to execute operations.We may even one day want to expose dataframe-like APIs on the delta table.
SO in short in believe this would simplify our APIs as well as allow for easier future optimizations.
Beta Was this translation helpful? Give feedback.
All reactions