Open
Description
Discussed in #2555
Originally posted by VLomonovskis May 30, 2024
Hello.
I have several processes that uses the same code and append data to the same Delta Table. Those processes run in parallel. I append data using write_deltalake and use rust engine to merge schema.
As several processes add data, performance degrading and upload takes more time. As I understand it happens because increases number of transaction log files. However, when I create checkpoint ( using delta_table.checkpoint() ), it does not improve performance and looks like write_deltalake reads all the logs before checkpoint. Can this behaviour be changed?
I did see discussions about checkpoint, but they where about checkpoint creation. In my case, checkpoints not used even when created.