Skip to content

Add rewrite_manifests EAR-1667#1002

Merged
dcherian merged 8 commits intomainfrom
empty-commits
Jun 26, 2025
Merged

Add rewrite_manifests EAR-1667#1002
dcherian merged 8 commits intomainfrom
empty-commits

Conversation

@dcherian
Copy link
Copy Markdown
Collaborator

@dcherian dcherian commented Jun 24, 2025

Closes #986

Updated docs here: https://icechunk--1002.org.readthedocs.build/en/1002/icechunk-python/performance/#splitting-manifests

Questions:

  1. Should we allow filtering by a prefix?
  2. Shall we automatically add the splitting config to the commit's metadata? and make sure it's easy to use for creating a new repo?

Decisions:

  1. I punted on (1)
  2. I've done (2) just to record the config, but didn't add the ability to easily create a ManifestSplittingConfig from the info in metadata.

@dcherian dcherian changed the base branch from main to manifest-split-write June 24, 2025 19:04
Ok(session)
}

pub async fn rewrite_manifests(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other options for consideration: optimize_manifests, reshard_manifests

Base automatically changed from manifest-split-write to main June 24, 2025 20:09
@dcherian dcherian changed the title Add rewrite_manifests Add rewrite_manifests EAER-1667 Jun 24, 2025
@dcherian dcherian changed the title Add rewrite_manifests EAER-1667 Add rewrite_manifests EAR-1667 Jun 24, 2025
@dcherian dcherian force-pushed the empty-commits branch 4 times, most recently from 4bbf424 to e285309 Compare June 24, 2025 22:22
@dcherian dcherian force-pushed the empty-commits branch 3 times, most recently from 97209a1 to 80b5b28 Compare June 24, 2025 22:48
self.store = self.repo.writable_session("main").store

@precondition(lambda self: not self.store.session.has_uncommitted_changes)
@rule(data=st.data())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lovely

@dcherian dcherian requested a review from paraseba June 26, 2025 19:47
Copy link
Copy Markdown
Collaborator

@paraseba paraseba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a test to prove ourselves that manifest can also be fused back into fewer ones by calling rewrite_manifests? Or are covered?

@dcherian
Copy link
Copy Markdown
Collaborator Author

we need a test to prove ourselves that manifest can also be fused back into fewer ones by calling rewrite_manifests? Or are covered?

That's covered. For completeness, I just added one for the case where we split existing multiple manifests in to smaller manifests too.

dcherian added 3 commits June 26, 2025 14:46
* main:
  Error fast when can_write is false and calling gc + expiration (#1014)
  Document cold buckets and GCS 429 problem (#1009)
  Add initial commit id as constant (#1008)
  Fix typo (#1010)
@dcherian dcherian enabled auto-merge (squash) June 26, 2025 21:55
@dcherian dcherian merged commit 4439888 into main Jun 26, 2025
10 of 11 checks passed
@dcherian dcherian deleted the empty-commits branch June 26, 2025 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow users to experiment with different splitting configurations

2 participants