Skip to content

Conversation

vovacf201
Copy link
Collaborator

modified_file_threshold_seconds - skips compaction for files older than specified threshold

modified_file_threshold_seconds - skips compaction for files older than specified threshold
iceberg-datafusion = { workspace = true }
itertools = "0.13.0"
mixtrics = "0.2.0"
object_store = { version = "0.11", features = ["aws"] }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternative is OpenDAL. I call it out because it's already used by the iceberg crate. It abstracts over different object stores (including S3 compatible stores), but object_store does that as well.

There is a read Operator that has a stat() function (docs) which returns Metadata, which has last_modified (docs).

Comment on lines +144 to +147
/// Threshold in seconds for file modification time filtering (default: 0, means include all files)
/// Only files modified within this many seconds from now will be included in compaction
#[builder(default = "DEFAULT_MODIFIED_FILE_THRESHOLD_SECONDS")]
pub modified_file_threshold_seconds: u64,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should call out in that is currently only supported with tables that object stores with s3-compatible APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants