Skip to content

Commit b5e4162

Browse files
archang19facebook-github-bot
authored andcommitted
Unify compaction prefetching logic (#13187)
Summary: In #13177, I discussed an unsigned integer overflow issue that affects compaction reads inside `FilePrefetchBuffer` when we attempt to enable the file system buffer reuse optimization. In that PR, I disabled the optimization whenever `for_compaction` was `true` to eliminate the source of the bug. **This PR safely re-enables the optimization when `for_compaction` is `true`.** We need to properly set the overlap buffer through `PrefetchInternal` rather than simply calling `Prefetch`. `Prefetch` assumes `num_buffers_` is 1 (i.e. async IO is disabled), so historically it did not have any overlap buffer logic. What ends up happening (with the old bug) is that, when we try to reuse the file system provided buffer, inside the `Prefetch` method, we read the remaining missing data. However, since we do not do any `RefitTail` method when `use_fs_buffer` is true, normally we would rely on copying the partial relevant data into an overlap buffer. That overlap buffer logic was missing, so the final main buffer ends up storing data from an offset that is greater than the requested offset, and we effectively end up "throwing away" part of the requested data. **This PR also unifies the prefetching logic for compaction and non-compaction reads:** - The same readahead size is used. Previously, we read only `std::max(n, readahead_size_)` bytes for compaction reads, rather than `n + readahead_size_` bytes - The stats for `PREFETCH_HITS` and `PREFETCH_BYTES_USEFUL` are tracked for both. Previously, they were only tracked for non-compaction reads. These two small changes should help reduce some of the cognitive load required to understand the codebase. The test suite also became easier to maintain. We could not come up with good reasons why the logic for the readahead size and stats should be different for compaction reads. Pull Request resolved: #13187 Test Plan: I removed the temporary test case from #13200 and incorporated the same test cases into my updated parameterized test case, which tests the valid combinations between `use_async_prefetch` and `for_compaction`. I went further and added a randomized test case that will simply try to hit `assert`ion failures and catch any missing areas in the logic. I also added a test case for compaction reads _without_ the file system buffer reuse optimization. I am thinking that it may be valuable to make a future PR that unifies a lot of these prefetch tests and parametrizes as much of them as possible. This way we can avoid writing duplicate tests and just look over different parameters for async IO, direct IO, file system buffer reuse, and `for_compaction`. Reviewed By: anand1976 Differential Revision: D66903373 Pulled By: archang19 fbshipit-source-id: 351b56abea2f0ec146b83e3d8065ccc69d40405d
1 parent d4bd67f commit b5e4162

File tree

2 files changed

+188
-86
lines changed

2 files changed

+188
-86
lines changed

file/file_prefetch_buffer.cc

+21-18
Original file line numberDiff line numberDiff line change
@@ -771,6 +771,12 @@ bool FilePrefetchBuffer::TryReadFromCache(const IOOptions& opts,
771771
bool FilePrefetchBuffer::TryReadFromCacheUntracked(
772772
const IOOptions& opts, RandomAccessFileReader* reader, uint64_t offset,
773773
size_t n, Slice* result, Status* status, bool for_compaction) {
774+
// We disallow async IO for compaction reads since they are performed in
775+
// the background anyways and are less latency sensitive compared to
776+
// user-initiated reads
777+
(void)for_compaction;
778+
assert(!for_compaction || num_buffers_ == 1);
779+
774780
if (track_min_offset_ && offset < min_offset_read_) {
775781
min_offset_read_ = static_cast<size_t>(offset);
776782
}
@@ -819,27 +825,22 @@ bool FilePrefetchBuffer::TryReadFromCacheUntracked(
819825
assert(reader != nullptr);
820826
assert(max_readahead_size_ >= readahead_size_);
821827

822-
if (for_compaction) {
823-
s = Prefetch(opts, reader, offset, std::max(n, readahead_size_));
824-
} else {
825-
if (implicit_auto_readahead_) {
826-
if (!IsEligibleForPrefetch(offset, n)) {
827-
// Ignore status as Prefetch is not called.
828-
s.PermitUncheckedError();
829-
return false;
830-
}
828+
if (implicit_auto_readahead_) {
829+
if (!IsEligibleForPrefetch(offset, n)) {
830+
// Ignore status as Prefetch is not called.
831+
s.PermitUncheckedError();
832+
return false;
831833
}
832-
833-
// Prefetch n + readahead_size_/2 synchronously as remaining
834-
// readahead_size_/2 will be prefetched asynchronously if num_buffers_
835-
// > 1.
836-
s = PrefetchInternal(
837-
opts, reader, offset, n,
838-
(num_buffers_ > 1 ? readahead_size_ / 2 : readahead_size_),
839-
copy_to_overlap_buffer);
840-
explicit_prefetch_submitted_ = false;
841834
}
842835

836+
// Prefetch n + readahead_size_/2 synchronously as remaining
837+
// readahead_size_/2 will be prefetched asynchronously if num_buffers_
838+
// > 1.
839+
s = PrefetchInternal(
840+
opts, reader, offset, n,
841+
(num_buffers_ > 1 ? readahead_size_ / 2 : readahead_size_),
842+
copy_to_overlap_buffer);
843+
explicit_prefetch_submitted_ = false;
843844
if (!s.ok()) {
844845
if (status) {
845846
*status = s;
@@ -854,6 +855,7 @@ bool FilePrefetchBuffer::TryReadFromCacheUntracked(
854855
return false;
855856
}
856857
} else if (!for_compaction) {
858+
// These stats are meant to track prefetch effectiveness for user reads only
857859
UpdateStats(/*found_in_buffer=*/true, n);
858860
}
859861

@@ -864,6 +866,7 @@ bool FilePrefetchBuffer::TryReadFromCacheUntracked(
864866
buf = overlap_buf_;
865867
}
866868
assert(buf->offset_ <= offset);
869+
assert(buf->IsDataBlockInBuffer(offset, n));
867870
uint64_t offset_in_buffer = offset - buf->offset_;
868871
*result = Slice(buf->buffer_.BufferStart() + offset_in_buffer, n);
869872
if (prefetched) {

0 commit comments

Comments
 (0)