Rewrite _deltalog
history
#3376
-
DescriptionEnable users to modify log entry when writting to deltatable. from deltalake import DeltaTable
from deltalake import write_deltatable
today: datetime
yesterday: datetime
# today > yesterday
uri = ...
storage_options = ...
dtable = DeltaTable(uri, storage_options=storage_options)
dtable.history()[0] # {"timestamp": today, "version": 0, ...}
write_deltatable(uri, metadata={"timestamp": yesterday}, storage_options=storage_options)
# last version keeps being `today`, since `today` > `yesterday`
dtable.history()[0] # {"timestamp": <today>, "version": 1, ...} Use Case In data engineering projects it is common to require collecting and saving data that was made available in the past, and then collecting it as it is made available. This is done in a shorter period than the underlying data would suggest (e.g. data available for the past five years in weekly frequency being collected in one afternoon). The fact that delta stores the operation timestamp and not the curve date makes the
What is the best practice for not having to make this distinction? Should I keep a separate table connecting the version with from then is the data, instead of when it was executed? Related Issue(s) I was not able to find anything related to this request, and I am questioning myself whether this would be good practice or not, tending to the "not" side. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
It would be quite an anti-pattern to rewrite logs. You can keep a separate look up table yes, or you add the real timestamp in your data and you partition on that and then just load that partition. |
Beta Was this translation helpful? Give feedback.
-
I had completely missed the partition solution! That can definitely solve it for me, thanks! The discussion can stay, since |
Beta Was this translation helpful? Give feedback.
It would be quite an anti-pattern to rewrite logs. You can keep a separate look up table yes, or you add the real timestamp in your data and you partition on that and then just load that partition.