Skip to content

Finish manifest splitting#948

Merged
dcherian merged 44 commits intomainfrom
manifest-split-write
Jun 24, 2025
Merged

Finish manifest splitting#948
dcherian merged 44 commits intomainfrom
manifest-split-write

Conversation

@dcherian
Copy link
Copy Markdown
Collaborator

@dcherian dcherian commented May 13, 2025

Closes #604
Closes #332

TODO:

  • add splitting config to stateful test?
  • test for changing the splitting config
  • error when merging incompatible splits

}

impl ManifestSplittingConfig {
pub fn get_split_sizes(&self, node: &NodeSnapshot) -> SessionResult<ManifestSplits> {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the indent has changed

@dcherian dcherian force-pushed the manifest-split-write branch from d5398aa to e5f3d9e Compare May 13, 2025 21:44
@dcherian dcherian force-pushed the manifest-split-write branch from e5f3d9e to 52a6e4c Compare June 6, 2025 22:48
&self,
node_id: &NodeId,
node_path: &Path,
extent: Option<ManifestExtents>,
Copy link
Copy Markdown
Collaborator Author

@dcherian dcherian Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have to learn some lifetime things to make this Option<&ManifestExtents>.

EDIT: haven't tackled this yet

change_set: ChangeSet,
default_commit_metadata: SnapshotProperties,
// This is an optimization so that we needn't figure out the split sizes on every set.
splits: HashMap<NodeId, ManifestSplits>,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stored on Session since we'll need to grab splits for arrays created in a previous session, but being modified in this one & the change_set doesn't have enough info (it receives NodeId, not NodeSnapshot IIRC)

// Q: What happens if we set a chunk, then change a dimension name, so
// that the split changes.
// A: We ignore it. splits are set once for a node in a session, and are never changed.
// self.cache_splits(&node.id, path, &shape, &dimension_names);
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could be stricter and error here if the splits change.

snapshot_id: &'a SnapshotId,
change_set: &'a ChangeSet,
node: NodeSnapshot,
extent: Option<ManifestExtents>,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: All iterators have gain Option<ManifestExtents> to do some filtering to extents=. This is where I'll need to add lifetimes to switch to Option<&ManifestExtents>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if for all these cases, instead of using Option it's easier to introduce a (constant) match all ManifestExtents: ManifestExtents::all : ManifestExtents

) -> SessionResult<Option<ManifestRef>> {
let mut from = vec![];
let mut to = vec![];
let chunks = aggregate_extents(&mut from, &mut to, chunks, |ci| &ci.coord);
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to bring this back.

AFAICT there's no way to update the extents when chunks are deleted without scanning the whole thing. To do so, we'll need to keep track of "how many chunks have chunk index 'i'", decrement that counter when deleting chunks, and then construct a bbox from that histogram when needed. This would be a format change.

#[allow(clippy::expect_used)]
let split_sizes = self
.split_sizes
pub fn get_split_sizes(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no logic changes, just the signature. Arguably, it should've been this way earlier 🤦🏾‍♂️

Ok(asset_manager.fetch_manifest(manifest_id, manifest_info.size_bytes).await?)
}

/// Map the iterator to accumulate the extents of the chunks traversed
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no logic changes

Copy link
Copy Markdown
Collaborator

@paraseba paraseba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a first pass to the easy parts. Still need to look into the manifest generation, but maybe we can chat about the easy parts first.

It looks great, made a few minor comments.

One thing I still need to understand is how will you know if a manifest split became empty after chunk deletes. That's probably in the parts I haven't read yet.

let ranges = std::iter::zip(self.iter(), other.iter())
.map(|(a, b)| max(a.start, b.start)..min(a.end, b.end))
.collect::<Vec<_>>();
if any(ranges.iter(), |r| r.end < r.start) { None } else { Some(Self(ranges)) }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the condition be r.end <= r.start? Aren't these [) intervals? One empty set gives an empty intersection

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we disallow empty ManifestExtents? I thought we were doing that, but apparently not.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would save us from confusion, yes.

splits: &ManifestSplits,
) {
#[allow(clippy::expect_used)]
let (_, extent) = splits.which_extent_and_index(&coord).expect("logic bug. Trying to set chunk ref but can't find the appropriate split manifest.");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is what i meant in my previous comment, isn't the index enough if you saved the SplitManifests in a vector instead of a HashMap?

None,
}

/// Important: this is not symmetric.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe adding it with a different name to the impl would help make the asymmetry more clear?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now moved to ManifestExtents.overlap_with(other: &ManifestExtents)

// test_cases: increasing size of (multiple) dimensions
// decreasing size of (multiple) dimensions
//
new_splits
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is ugly. Since we track extents on the change set, we need to update them for any resize 🤦🏾‍♂️

@dcherian dcherian force-pushed the manifest-split-write branch 3 times, most recently from 6fe675b to 6d52d2c Compare June 10, 2025 23:29
)
});
for (extent, their_manifest) in other_splits {
manifests.entry(extent).or_default().extend(their_manifest)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is readable but seems silly for the default case where we can just insert their_manifest.

@dcherian dcherian force-pushed the manifest-split-write branch 2 times, most recently from c775d21 to ce67942 Compare June 11, 2025 20:26
// This map keeps track of any chunk deletes that are
// outside the domain of the current array shape. This is needed to handle
// the very unlikely case of multiple resizes in the same session.
deleted_chunks_outside_bounds: BTreeMap<NodeId, HashSet<ChunkIndices>>,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be the cleanest solution to me.

This commit tried tracking all deleted chunks separately (5af247e) but it's very messy.

@dcherian dcherian force-pushed the manifest-split-write branch from ce67942 to cd750e6 Compare June 11, 2025 20:35
- 1 * ((ax > 0) as usize);
dbg!(&ax, &add);
total_manifests += add;
total_manifests +=
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number went down!

@dcherian dcherian force-pushed the manifest-split-write branch from 50f2a27 to 46d13b6 Compare June 17, 2025 20:21
@dcherian dcherian force-pushed the manifest-split-write branch from 900f798 to b7fb385 Compare June 17, 2025 21:14
import functools


def with_frequency(frequency):
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was Claude. looks good!

@dcherian dcherian force-pushed the manifest-split-write branch from 4de248c to 1020ec1 Compare June 18, 2025 23:32
@dcherian dcherian force-pushed the manifest-split-write branch from 1020ec1 to b179dbc Compare June 20, 2025 20:29
@dcherian dcherian force-pushed the manifest-split-write branch from 2a681ce to 4e68231 Compare June 21, 2025 23:47
@dcherian
Copy link
Copy Markdown
Collaborator Author

dcherian commented Jun 23, 2025

Some benchmarks for a repo with 500_000 chunk refs, written to 50 manifests with 10_000 refs each.

  1. Appends are much faster as expected. Each append adds 50_000 chunks refs = 5 new manifest files.
  2. we have a slowdown for full rewrites . This is expected since we write each split in serial, this is slow but reduces memory consumption. As an experiment I did a crude async implementation for manifest writes per node, as expected this is a nice speedup.

After discussion with Seba, I have reverted that async write commit, and this PR should be good to go. I'll address #986 in a new PR.

Without async writes

------------------ benchmark 'refs-write test_write_split_manifest_refs_append': 4 tests ------------------
Name (time in ms)                                                                          Median
-----------------------------------------------------------------------------------------------------------
test_write_split_manifest_refs_append[no-splitting-large-1d] (p/s3_e184e2998)            792.6242 (1.64)
test_write_split_manifest_refs_append[no-splitting-large-1d] (p/s3_main_7e03)            815.8424 (1.69)
test_write_split_manifest_refs_append[split-size-10_000-large-1d] (p/s3_e184e2998)       483.8746 (1.0)
test_write_split_manifest_refs_append[split-size-10_000-large-1d] (p/s3_main_7e03)     1,987.4977 (4.11)
-----------------------------------------------------------------------------------------------------------

---------------- benchmark 'refs-write test_write_split_manifest_refs_full_rewrite': 4 tests ----------------
Name (time in s)                                                                             Median
-------------------------------------------------------------------------------------------------------------
test_write_split_manifest_refs_full_rewrite[no-splitting-large-1d] (p/s3_e184e2998)          1.7160 (1.04)
test_write_split_manifest_refs_full_rewrite[no-splitting-large-1d] (p/s3_main_7e03)          1.6557 (1.0)
test_write_split_manifest_refs_full_rewrite[split-size-10_000-large-1d] (p/s3_e184e2998)     3.9933 (2.41)
test_write_split_manifest_refs_full_rewrite[split-size-10_000-large-1d] (p/s3_main_7e03)     4.2187 (2.55)
-------------------------------------------------------------------------------------------------------------

With async writes

------------------ benchmark 'refs-write test_write_split_manifest_refs_append': 4 tests ------------------
Name (time in ms)                                                                          Median
-----------------------------------------------------------------------------------------------------------
test_write_split_manifest_refs_append[no-splitting-large-1d] (m/s3_HEAD_2a68)            709.7038 (2.97)
test_write_split_manifest_refs_append[no-splitting-large-1d] (m/s3_main_7e03)            764.4788 (3.20)
test_write_split_manifest_refs_append[split-size-10_000-large-1d] (m/s3_HEAD_2a68)       238.7356 (1.0)
test_write_split_manifest_refs_append[split-size-10_000-large-1d] (m/s3_main_7e03)     1,726.8433 (7.23)
-----------------------------------------------------------------------------------------------------------

---------------- benchmark 'refs-write test_write_split_manifest_refs_full_rewrite': 4 tests ----------------
Name (time in s)                                                                             Median
-------------------------------------------------------------------------------------------------------------
test_write_split_manifest_refs_full_rewrite[no-splitting-large-1d] (m/s3_HEAD_2a68)          1.6306 (1.22)
test_write_split_manifest_refs_full_rewrite[no-splitting-large-1d] (m/s3_main_7e03)          1.5534 (1.16)
test_write_split_manifest_refs_full_rewrite[split-size-10_000-large-1d] (m/s3_HEAD_2a68)     1.3361 (1.0)
test_write_split_manifest_refs_full_rewrite[split-size-10_000-large-1d] (m/s3_main_7e03)     3.6363 (2.72)
-------------------------------------------------------------------------------------------------------------

@dcherian dcherian requested a review from paraseba June 23, 2025 18:08
@dcherian dcherian merged commit 0f24418 into main Jun 24, 2025
11 checks passed
@dcherian dcherian deleted the manifest-split-write branch June 24, 2025 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When array metadata changes reducing size, we sholud rewrite the manifest Support multiple manifests for a single array

2 participants