How to optimize reading of many versions of delta table #3807
Unanswered
processadd
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I have a raw delta table
raw
loaded from Kinesis by a structured streaming job, and every several seconds there is a new version created in raw, appended from the structured streaming job.I also have a structured streaming job with trigger=availableNow to read from
raw
and it is triggered daily. So each trigger might see thousands of versions fromraw
. I use foreachBatch to do MERGE on a target delta table.The MERGE seems fast (less than 1 min) but it shows a long time on duraionMs like
Why it needs so long time for latestOffset and addBatch?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions