Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify compaction output record count #13455

Closed
wants to merge 6 commits into from

Conversation

jaykorean
Copy link
Contributor

@jaykorean jaykorean commented Mar 10, 2025

Summary

Continuing @cbi42 's work in 602cc0f.

In this PR, we are adding record count verification for each compaction by comparing number of entries summed from Table Properties with the number of output records from the compaction stats.

If the count does not match, Status::Corruption(msg) is returned with detailed message including the actual number (from table property) and the expected number (from compaction stats)

Test Plan

New UT added

./db_compaction_test -- --gtest_filter="*Verify*"

The check had to be disabled for some of the existing tests using MockTable/MockTableFactory, because TableProperties aren't populated properly for the MockTables.

@jaykorean jaykorean added the WIP Work in progress label Mar 10, 2025
@jaykorean jaykorean force-pushed the verfy_compaction_output branch 4 times, most recently from 7bb81cd to 17be7df Compare March 14, 2025 22:19
@jaykorean jaykorean marked this pull request as ready for review March 15, 2025 00:31
@jaykorean jaykorean requested review from hx235 and cbi42 March 15, 2025 00:31
@facebook-github-bot
Copy link
Contributor

@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@hx235
Copy link
Contributor

hx235 commented Mar 17, 2025

@jaykorean Can we add new UT like in #11571 for corruption in output files preferably containing range deletion?

@jaykorean jaykorean removed the WIP Work in progress label Mar 17, 2025
@jaykorean jaykorean force-pushed the verfy_compaction_output branch from 9b102cd to 1ba7887 Compare March 17, 2025 23:47
@facebook-github-bot
Copy link
Contributor

@jaykorean has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@jaykorean has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jaykorean jaykorean requested a review from hx235 March 18, 2025 16:41
Copy link
Contributor

@hx235 hx235 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for working on this!

@jaykorean jaykorean force-pushed the verfy_compaction_output branch from d08862a to 79ca3af Compare March 18, 2025 23:48
@facebook-github-bot
Copy link
Contributor

@jaykorean has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jaykorean has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jaykorean merged this pull request in 0a43d8a.

facebook-github-bot pushed a commit that referenced this pull request Mar 20, 2025
Summary:
## Issue

Thanks to PRs #13455 and #13464 , we were able to find another issue with compaction stats.

When there are multiple sub-compactions and they are processed remotely, some compaction stats are not collected correctly.

Here's an example of how `num_input_records` can be double-counted during a compaction with multiple sub-compactions executed remotely. Please note that this problem is not limited to `num_input_records`.

Input File: 1 SST file with 100 keys.

- Key 1~50 are in one sub compaction
- Key 51~100 in another sub compaction

`UpdateOutputLevelCompactionStats()` currently retrieves the total number of entries from the input files and sets `num_input_records` in the internal_stats to 100. In `CompactionJob::Run()`, this method is called once after all sub-compactions have finished. However, during remote compaction, `UpdateOutputLevelCompactionStats()` is called for each offloaded sub-compaction on the remote side and then aggregated on the primary host. The internal_stats for the first sub-compaction will have 100 `num_input_records`, and the second sub-compaction will have another 100 `num_input_records`. We end up having 200 `num_input_records` in the aggregated internal_stats.

There was another issue that `num_input_record` was not properly excluding `num_input_range_del` in `UpdateCompactionJobStats()`. `job_stats_->num_input_record` originally has correct value set by compaction iterator, but then later overwritten in `UpdateCompactionJobStats()`. `UpdateCompactionJobStats()` was called during `CompactionJob::Install()`, so not caught by `VerifyInputRecordCount()`.

## Refactor and other changes before the fixes
* Renamed `UpdateOutputLevelCompactionStats()` to `BuildStatsFromInputTableProperties()` to make the function more descriptive. `BuildStatsFromInputTableProperties()` builds input stats by scanning through entries from TableProperties in the Input Files and it's at the top compaction level, not at the sub-compaction level. (It also updates a couple of non-input stats, `bytes_read_blob` and `num_dropped_records`, but will be refactored in a later PR.)
* `UpdateCompactionJobStats()` was moved from `CompactionJob::Install()` to `CompactionJob::Run()` and separated into `UpdateCompactionJobInputStats()` and `UpdateCompactionJobOutputStats()`.

## Fixes
* Remote Compaction no longer updates the subcompaction-job-level input stats from InputTableProperties to avoid double-counted stats in case of multiple sub-compactions. Subcompaction-job-level input stats are aggregated to the compaction-job-level input stats in the primary host after all sub-compactions are finished.
* Remote Compaction now only calls `UpdateCompactionJobOutputStats()` to update the job-level output stats by copying from internal stats.
* `UpdateCompactionJobInputStats()` now takes `num_input_range_del` and properly subtracts it from the input record count. `VerifyInputRecordCount()` expected `job_stats.num_input_records` to be equal to `internal_stats_.output_level_stats.num_input_records - num_input_range_del`. However, when updating the job-level stats, we were taking the entire `internal_stats_.output_level_stats.num_input_records` after verification.

Pull Request resolved: #13470

Test Plan:
Local Compaction
```
./db_compaction_test -- --gtest_filter="*DBCompactionTest.VerifyRecordCount*"
```
Remote Compaction
```
./compaction_service_test --gtest_filter="*CompactionServiceTest.VerifyInputRecordCount*"
```

Reviewed By: pdillinger

Differential Revision: D71566149

Pulled By: jaykorean

fbshipit-source-id: c8aafcde701dec8901fd5e5a9ec186e26b896c19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants