Avoid writing additional data to nearly full tlogs#12809
Avoid writing additional data to nearly full tlogs#12809tclinkenbeard-oai wants to merge 7 commits intoapple:mainfrom
Conversation
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
| init( TLOG_SPILL_THRESHOLD, 1500e6 ); if( smallTlogTarget ) TLOG_SPILL_THRESHOLD = 1500e3; if( randomize && BUGGIFY ) TLOG_SPILL_THRESHOLD = 0; | ||
| init( REFERENCE_SPILL_UPDATE_STORAGE_BYTE_LIMIT, 20e6 ); if( (randomize && BUGGIFY) || smallTlogTarget ) REFERENCE_SPILL_UPDATE_STORAGE_BYTE_LIMIT = 1e6; | ||
| init( TLOG_HARD_LIMIT_BYTES, 3000e6 ); if( smallTlogTarget ) TLOG_HARD_LIMIT_BYTES = 30e6; | ||
| init( TLOG_MIN_AVAILABLE_SPACE_RATIO, 0.0 ); if( randomize && BUGGIFY ) TLOG_MIN_AVAILABLE_SPACE_RATIO = 0.5; |
There was a problem hiding this comment.
Should we have like 0.2 for our case?
There was a problem hiding this comment.
Yeah, I think we'll want to set this knob to something higher. 0.05 should be sufficient, because tlog disk usage will stop growing once the threshold is hit
| auto ratio = [](StorageBytes const& storageBytes) -> double { | ||
| return storageBytes.total > 0 ? double(storageBytes.available) / storageBytes.total : 1.0; | ||
| }; | ||
| return std::min(ratio(kvStoreBytes), ratio(queueBytes)); |
There was a problem hiding this comment.
Should we have max of (kvStoreBytes, queueBytes), and then apply ratio? I think it would be simpler to understand.
There was a problem hiding this comment.
That could be simpler, but in theory kvStoreBytes.total and queueBytes.total could be different. They usually aren't, but there's a valid use case to use different disks for spilling and the disk queue, and the current logic handles that case
fdbserver/TLogServer.actor.cpp
Outdated
| while (!logData->stopped()) { | ||
| StorageBytes kvStoreBytes = self->persistentData->getStorageBytes(); | ||
| StorageBytes queueBytes = self->rawPersistentQueue->getStorageBytes(); | ||
| if (self->shouldAcceptNewData(kvStoreBytes, queueBytes, SERVER_KNOBS->TLOG_MIN_AVAILABLE_SPACE_RATIO)) { |
There was a problem hiding this comment.
Do you think it would be better to introduce hysteresis here - and start accepting if it's > TLOG_MIN_AVAILABLE_SPACE_RATIO.
There was a problem hiding this comment.
I pushed some commits to now fail the tlog role when the tlog runs out of space, so there's no coroutine waiting on disk space to recover anymore
Result of foundationdb-pr-clang-ide on Linux RHEL 9
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-clang on Linux RHEL 9
|
Result of foundationdb-pr on Linux RHEL 9
|
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Primary and satellite tlogs rely on ratekeeper to throttle once available disk space drops below
MIN_AVAILABLE_SPACE_RATIO(by default 5%). However, remote tlogs don't have a similar safeguard. It makes sense for ratekeeper not to throttle on remote tlog disk space, because remote tlogs shouldn't directly affect availability. However, this can lead to a situation where remote tlogs completely run out of space and are unable to restart in order to serve peek and pop requests from previous log generations. This causes remote storage servers' lag to grow indefinitely.This PR introduces a
TLOG_MIN_AVAILABLE_SPACE_RATIOknob that tlogs can use to protect themselves, without relying on ratekeeper. By default the knob is set to 0, effectively disabling this feature. However, if set, tlogs will stop accepting commits or pulling async data once their available space ratio reaches the configured threshold. If remote tlogs hit this situation, they will lag behind, but they will still be available to serve peek and pop requests from remote storage servers.Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branchormainif this is the youngest branch)