-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Verify compaction output record count #13455
Conversation
7bb81cd
to
17be7df
Compare
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jaykorean Can we add new UT like in #11571 for corruption in output files preferably containing range deletion? |
9b102cd
to
1ba7887
Compare
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
1 similar comment
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks for working on this!
d08862a
to
79ca3af
Compare
@jaykorean has updated the pull request. You must reimport the pull request before landing. |
@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jaykorean merged this pull request in 0a43d8a. |
Summary: ## Issue Thanks to PRs #13455 and #13464 , we were able to find another issue with compaction stats. When there are multiple sub-compactions and they are processed remotely, some compaction stats are not collected correctly. Here's an example of how `num_input_records` can be double-counted during a compaction with multiple sub-compactions executed remotely. Please note that this problem is not limited to `num_input_records`. Input File: 1 SST file with 100 keys. - Key 1~50 are in one sub compaction - Key 51~100 in another sub compaction `UpdateOutputLevelCompactionStats()` currently retrieves the total number of entries from the input files and sets `num_input_records` in the internal_stats to 100. In `CompactionJob::Run()`, this method is called once after all sub-compactions have finished. However, during remote compaction, `UpdateOutputLevelCompactionStats()` is called for each offloaded sub-compaction on the remote side and then aggregated on the primary host. The internal_stats for the first sub-compaction will have 100 `num_input_records`, and the second sub-compaction will have another 100 `num_input_records`. We end up having 200 `num_input_records` in the aggregated internal_stats. There was another issue that `num_input_record` was not properly excluding `num_input_range_del` in `UpdateCompactionJobStats()`. `job_stats_->num_input_record` originally has correct value set by compaction iterator, but then later overwritten in `UpdateCompactionJobStats()`. `UpdateCompactionJobStats()` was called during `CompactionJob::Install()`, so not caught by `VerifyInputRecordCount()`. ## Refactor and other changes before the fixes * Renamed `UpdateOutputLevelCompactionStats()` to `BuildStatsFromInputTableProperties()` to make the function more descriptive. `BuildStatsFromInputTableProperties()` builds input stats by scanning through entries from TableProperties in the Input Files and it's at the top compaction level, not at the sub-compaction level. (It also updates a couple of non-input stats, `bytes_read_blob` and `num_dropped_records`, but will be refactored in a later PR.) * `UpdateCompactionJobStats()` was moved from `CompactionJob::Install()` to `CompactionJob::Run()` and separated into `UpdateCompactionJobInputStats()` and `UpdateCompactionJobOutputStats()`. ## Fixes * Remote Compaction no longer updates the subcompaction-job-level input stats from InputTableProperties to avoid double-counted stats in case of multiple sub-compactions. Subcompaction-job-level input stats are aggregated to the compaction-job-level input stats in the primary host after all sub-compactions are finished. * Remote Compaction now only calls `UpdateCompactionJobOutputStats()` to update the job-level output stats by copying from internal stats. * `UpdateCompactionJobInputStats()` now takes `num_input_range_del` and properly subtracts it from the input record count. `VerifyInputRecordCount()` expected `job_stats.num_input_records` to be equal to `internal_stats_.output_level_stats.num_input_records - num_input_range_del`. However, when updating the job-level stats, we were taking the entire `internal_stats_.output_level_stats.num_input_records` after verification. Pull Request resolved: #13470 Test Plan: Local Compaction ``` ./db_compaction_test -- --gtest_filter="*DBCompactionTest.VerifyRecordCount*" ``` Remote Compaction ``` ./compaction_service_test --gtest_filter="*CompactionServiceTest.VerifyInputRecordCount*" ``` Reviewed By: pdillinger Differential Revision: D71566149 Pulled By: jaykorean fbshipit-source-id: c8aafcde701dec8901fd5e5a9ec186e26b896c19
Summary
Continuing @cbi42 's work in 602cc0f.
In this PR, we are adding record count verification for each compaction by comparing number of entries summed from Table Properties with the number of output records from the compaction stats.
If the count does not match,
Status::Corruption(msg)
is returned with detailed message including the actual number (from table property) and the expected number (from compaction stats)Test Plan
New UT added
The check had to be disabled for some of the existing tests using MockTable/MockTableFactory, because TableProperties aren't populated properly for the MockTables.