Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexible MultiRead demonstration #13331

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

archang19
Copy link
Contributor

@archang19 archang19 commented Jan 24, 2025

We want to enable FilePrefetchBuffer to make read requests that are block aligned whenever possible. This means rounding up the read requests to the next block boundary.

There are quite a few different options we can go down, and I have written some analysis on their tradeoffs.

Working through a sample prototype / hacky implementation can make the decision more obvious, and in this case I think that was very true.

It looks like the "flexible reads" approach is the most attractive. However, we will be using the MultiRead API, rather than the Read API, since we want to take advantage of the file system buffer reuse optimization, which only MultiRead supports currently. We can easily add new fields to FSReadRequest to support our use case, while maintaining backwards compatibility.

I really think that our final approach can be very, very simple. I put one approach in this PR where we only need to:

  1. Expose the block size in FSRandomAccessFile and RandomAccessFileReader
  2. Add a new field inside FSReadRequest
  3. Adjust the rate limiting logic to take into account the potential additional readahead

Minor decisions to be made:

  1. Whether to add a new option inside IOOptions to gate the feature. Right now we can just rely on for_compaction which is given to us in TryReadFromCache. In the future we want to expand this to low-priority reads, but right now IOOptions's priority field is marked as deprecated.
  2. Whether we can even get away without adding a new block size interface to FSRandomAccessFile / RandomAccessFileReader. We still would want to specify some max read size for rate limiting purposes, but this option may work if we allow more flexibility into how much more data can be returned. As long as our max readahead size covers the block boundary, we would benefit from the optimization. Personally I don't think adding the block size interface is too bad, since it is a somewhat general concept.

I think this approach is very promising. We can hide almost all of the complexity on the RocksDB end, maintain backwards compatibility / safety, and still make the optimally sized read requests.

@archang19 archang19 force-pushed the demonstrate-flexible-multi-read branch from 990857a to f2de8e4 Compare January 24, 2025 19:28
@archang19 archang19 force-pushed the demonstrate-flexible-multi-read branch from f2de8e4 to 1e10aef Compare January 24, 2025 21:47
@archang19 archang19 changed the title Demonstrate how flexible read could look like Flexible MultiRead demonstration Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants