-
Couldn't load subscription status.
- Fork 537
Description
Environment
Delta-rs version: v0.23.0
Binding:
Environment:
- Cloud provider:
- OS:centos8
- Other:
Bug
We are encountering performance issues due to the large number of .json files generated in the _delta_log directory when using Delta Lake with delta-rs. These files, which represent transaction logs, have grown significantly over time in our setup.
The excessive number of small .json files is causing the following issues:
Increased overhead in file system operations (e.g., scanning and metadata retrieval).
Slower table initialization and query performance, especially when the transaction logs are not regularly cleaned.
Significant performance degradation in environments with object stores or distributed file systems (e.g., S3, HDFS, OSS).
We are looking for an efficient way to quickly delete or clean up these .json files to improve the overall performance of the Delta table.
I try to vaccum them by the code
async fn vacuum(delta_table: deltalake::DeltaTable) {
let snapshot = delta_table.snapshot().unwrap();
match VacuumBuilder::new(delta_table.log_store(), snapshot.clone())
.with_retention_period(chrono::Duration::zero())
.with_enforce_retention_duration(false)
.with_commit_properties(CommitProperties::default().with_cleanup_expired_logs(Some(true)))
.await
{
Ok((_, metrics)) => println!("vacuum metrics: {:?}", metrics),
Err(e) => println!("vacuum error: {:?}", e),
}
}What happened:
.json still here
What you expected to happen:
delete some .json file if checkponit exists
How to reproduce it:
More details: