Adapt GetObject part size to the requested range #1584

passaro · 2025-08-22T10:57:25Z

Set the part size on GetObject requests to the requested range, if it is smaller than the client read part size. Since in this case the request will not be split in parts, the change only affects the way the CRT internally reserves the memory for the data to return.

Addresses an issue where the Prefetcher could cause higher peak memory usage with the unified memory pool compared to the previous approach (internal CRT pool + copy into a new buffer), since small requests would consume part-size buffers for longer.

Does this change impact existing behavior?

No functional changes.

Does this change need a changelog entry? Does it require a version change?

TODO: update client and fs

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

Signed-off-by: Alessandro Passaro <[email protected]>

mansi153 · 2025-08-22T11:59:11Z

mountpoint-s3-client/src/s3_crt_client/get_object.rs

+                params
+                    .range
+                    .as_ref()
+                    .map(|range| range.end - range.start)


This means the first_request_stream will always request the part size min(INITIAL_READ_WINDOW_SIZE, object_size), right?

Few questions:

Is it the right understanding that the new logic also makes the splitting of metarequests more sensible, given we always start the second metarequest with range [INITIAL_READ_WINDOW_SIZE, end of object] (and not [PART_SIZE, end of object]) even though the first request has caused CRT to return up to part size?

I believe this should also reduce the ttfb for the first read?

Do we have this risk of blocked memory in writes too, and do we intend to make any changes there to reduce default part-size etc.?

This means the first_request_stream will always request the part size min(INITIAL_READ_WINDOW_SIZE, object_size), right?

Right.

Few questions:

Is it the right understanding that the new logic also makes the splitting of metarequests more sensible, given we always start the second metarequest with range [INITIAL_READ_WINDOW_SIZE, end of object] (and not [PART_SIZE, end of object]) even though the first request has caused CRT to return up to part size?

That is not changed. We are still requesting INITIAL_READ_WINDOW_SIZE bytes as before, the only difference is that we also tell the CRT to use INITIAL_READ_WINDOW_SIZE as the part size.

I believe this should also reduce the ttfb for the first read?

No changes either.

Do we have this risk of blocked memory in writes too, and do we intend to make any changes there to reduce default part-size etc.?

That's always been the case, but for writes we do not know the size in advance, so no trivial fix there.

That is not changed. We are still requesting INITIAL_READ_WINDOW_SIZE bytes as before, the only difference is that we also tell the CRT to use INITIAL_READ_WINDOW_SIZE as the part size.

Right! But won't this also change the S3 request range (and hence, size) now that we're requesting a 1.128M part instead of an 8M part? Previously we would receive data upto 8M offset but then re-request the 1.128M-8M range in the second meta request. Now we only receive 1.128M in the first go because it's the part size. So that is a change with this fix, no?

(^My comment about reduced ttfb was also based on this understanding, that smaller part GET => smaller S3 latency)

We were already only requesting INITIAL_READ_WINDOW_SIZE (1.125 MiB) before the change. read_part_size only comes into play for the buffer reservation, not for the data that is downloaded.

mansi153 · 2025-08-22T12:00:01Z

mountpoint-s3-client/src/s3_crt_client/get_object.rs

            let mut options = message.into_options(S3Operation::GetObject);
-            options.part_size(self.inner.read_part_size as u64);
+
+            // Use the client read part size, or the requested range if smaller.


Can we pls add a slightly more elaborate explanation of why we're doing this, for future context? The PR description is sufficient, however that sometimes gets lost when we refactor code for unrelated reasons :(

mansi153 · 2025-08-22T12:15:29Z

mountpoint-s3/src/run.rs

    // Set up a paged memory pool
    let pool = PagedPool::new_with_candidate_sizes([
        args.cache_block_size_in_bytes() as usize,
+        mountpoint_s3_fs::s3::config::INITIAL_READ_WINDOW_SIZE,


For my understanding, this implies that for objects with [INITIAL_READ_WINDOW_SIZE < objects_size < read_part_size], we will still be using a peak memory of [INITIAL_READ_WINDOW_SIZE + part_size] right (because we advance the CRT backpressure window halfway through the first metarequest being read)?

Only if 2 * INITIAL_READ_WINDOW_SIZE <= object_size < read_part_size. What matters is the range on the second request, which will be object_size - INITIAL_READ_WINDOW_SIZE. If that is greater than INITIAL_READ_WINDOW_SIZE, it will require a full read_part_size buffer.

Oh right, yes, makes sense

passaro · 2025-09-23T12:25:44Z

Closing this PR. The issue will be addressed on the CRT side (PR under review awslabs/aws-c-s3#563).

passaro added 3 commits August 22, 2025 11:47

Set part size to the range if smaller than client read part size

e457a0e

Signed-off-by: Alessandro Passaro <[email protected]>

Add test for buffer sizes requested by the CRT

7ec6707

Signed-off-by: Alessandro Passaro <[email protected]>

Add INITIAL_READ_WINDOW_SIZE pool to the memory pool

4c3960c

Signed-off-by: Alessandro Passaro <[email protected]>

passaro temporarily deployed to PR integration tests August 22, 2025 10:57 — with GitHub Actions Inactive

passaro had a problem deploying to PR integration tests August 22, 2025 10:57 — with GitHub Actions Failure

passaro temporarily deployed to PR integration tests August 22, 2025 10:57 — with GitHub Actions Inactive

passaro had a problem deploying to PR integration tests August 22, 2025 10:57 — with GitHub Actions Failure

passaro temporarily deployed to PR integration tests August 22, 2025 10:57 — with GitHub Actions Inactive

mansi153 reviewed Aug 22, 2025

View reviewed changes

passaro had a problem deploying to PR integration tests August 22, 2025 14:05 — with GitHub Actions Failure

passaro closed this Sep 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adapt GetObject part size to the requested range #1584

Adapt GetObject part size to the requested range #1584

Uh oh!

passaro commented Aug 22, 2025

Uh oh!

mansi153 Aug 22, 2025

Uh oh!

passaro Aug 22, 2025

Uh oh!

mansi153 Aug 22, 2025

Uh oh!

passaro Aug 22, 2025

Uh oh!

mansi153 Aug 22, 2025 •

edited

Loading

Uh oh!

mansi153 Aug 22, 2025

Uh oh!

passaro Aug 22, 2025

Uh oh!

mansi153 Aug 22, 2025

Uh oh!

passaro commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adapt GetObject part size to the requested range #1584

Adapt GetObject part size to the requested range #1584

Uh oh!

Conversation

passaro commented Aug 22, 2025

Does this change impact existing behavior?

Does this change need a changelog entry? Does it require a version change?

Uh oh!

mansi153 Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

passaro Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

mansi153 Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

passaro Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

mansi153 Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mansi153 Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

passaro Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

mansi153 Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

passaro commented Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mansi153 Aug 22, 2025 •

edited

Loading