Skip to content

Avoid writing additional data to nearly full tlogs#12809

Open
tclinkenbeard-oai wants to merge 7 commits intoapple:mainfrom
tclinkenbeard-oai:dev/tclinkenbeard/tlog-stop-accepting
Open

Avoid writing additional data to nearly full tlogs#12809
tclinkenbeard-oai wants to merge 7 commits intoapple:mainfrom
tclinkenbeard-oai:dev/tclinkenbeard/tlog-stop-accepting

Conversation

@tclinkenbeard-oai
Copy link
Collaborator

@tclinkenbeard-oai tclinkenbeard-oai commented Mar 19, 2026

Primary and satellite tlogs rely on ratekeeper to throttle once available disk space drops below MIN_AVAILABLE_SPACE_RATIO (by default 5%). However, remote tlogs don't have a similar safeguard. It makes sense for ratekeeper not to throttle on remote tlog disk space, because remote tlogs shouldn't directly affect availability. However, this can lead to a situation where remote tlogs completely run out of space and are unable to restart in order to serve peek and pop requests from previous log generations. This causes remote storage servers' lag to grow indefinitely.

This PR introduces a TLOG_MIN_AVAILABLE_SPACE_RATIO knob that tlogs can use to protect themselves, without relying on ratekeeper. By default the knob is set to 0, effectively disabling this feature. However, if set, tlogs will stop accepting commits or pulling async data once their available space ratio reaches the configured threshold. If remote tlogs hit this situation, they will lag behind, but they will still be available to serve peek and pop requests from remote storage servers.

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

@tclinkenbeard-oai tclinkenbeard-oai marked this pull request as draft March 19, 2026 08:36
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 6c723f9
  • Duration 0:25:47
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 6c723f9
  • Duration 0:32:12
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 6c723f9
  • Duration 0:42:19
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 6c723f9
  • Duration 0:48:39
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 6c723f9
  • Duration 0:55:36
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 6c723f9
  • Duration 0:56:54
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 6c723f9
  • Duration 2:09:35
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

init( TLOG_SPILL_THRESHOLD, 1500e6 ); if( smallTlogTarget ) TLOG_SPILL_THRESHOLD = 1500e3; if( randomize && BUGGIFY ) TLOG_SPILL_THRESHOLD = 0;
init( REFERENCE_SPILL_UPDATE_STORAGE_BYTE_LIMIT, 20e6 ); if( (randomize && BUGGIFY) || smallTlogTarget ) REFERENCE_SPILL_UPDATE_STORAGE_BYTE_LIMIT = 1e6;
init( TLOG_HARD_LIMIT_BYTES, 3000e6 ); if( smallTlogTarget ) TLOG_HARD_LIMIT_BYTES = 30e6;
init( TLOG_MIN_AVAILABLE_SPACE_RATIO, 0.0 ); if( randomize && BUGGIFY ) TLOG_MIN_AVAILABLE_SPACE_RATIO = 0.5;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have like 0.2 for our case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we'll want to set this knob to something higher. 0.05 should be sufficient, because tlog disk usage will stop growing once the threshold is hit

auto ratio = [](StorageBytes const& storageBytes) -> double {
return storageBytes.total > 0 ? double(storageBytes.available) / storageBytes.total : 1.0;
};
return std::min(ratio(kvStoreBytes), ratio(queueBytes));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have max of (kvStoreBytes, queueBytes), and then apply ratio? I think it would be simpler to understand.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be simpler, but in theory kvStoreBytes.total and queueBytes.total could be different. They usually aren't, but there's a valid use case to use different disks for spilling and the disk queue, and the current logic handles that case

while (!logData->stopped()) {
StorageBytes kvStoreBytes = self->persistentData->getStorageBytes();
StorageBytes queueBytes = self->rawPersistentQueue->getStorageBytes();
if (self->shouldAcceptNewData(kvStoreBytes, queueBytes, SERVER_KNOBS->TLOG_MIN_AVAILABLE_SPACE_RATIO)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be better to introduce hysteresis here - and start accepting if it's > TLOG_MIN_AVAILABLE_SPACE_RATIO.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed some commits to now fail the tlog role when the tlog runs out of space, so there's no coroutine waiting on disk space to recover anymore

@tclinkenbeard-oai tclinkenbeard-oai marked this pull request as ready for review March 19, 2026 21:10
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: a064b13
  • Duration 0:31:16
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: a064b13
  • Duration 0:43:13
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: a064b13
  • Duration 0:50:35
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: a064b13
  • Duration 0:54:30
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: a064b13
  • Duration 2:14:19
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants