Add replication policy doc #12179

dlambrig · 2025-06-03T15:45:59Z

Describe the replication policy.

When reviewing in github, click on "rich diff" to see the markup format rendered.

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

The PR has a description, explaining both the problem and the solution.
The description mentions which forms of testing were done and the testing seems reasonable.
Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

foundationdb-ci · 2025-06-03T16:10:31Z

Result of foundationdb-pr-clang-ide on Linux RHEL 9

Commit ID: 59f0b11
Duration 0:24:18
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2025-06-03T16:22:58Z

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

Commit ID: 59f0b11
Duration 0:36:48
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2025-06-03T16:35:38Z

Result of foundationdb-pr-clang-arm on Linux CentOS 7

Commit ID: 59f0b11
Duration 0:49:27
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2025-06-03T16:44:18Z

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

Commit ID: 59f0b11
Duration 0:58:09
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)
Cluster Test Logs zip file of the test logs (available for 30 days)

foundationdb-ci · 2025-06-03T16:45:55Z

Result of foundationdb-pr-macos on macOS Ventura 13.x

Commit ID: 59f0b11
Duration 0:59:46
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2025-06-03T18:07:25Z

Result of foundationdb-pr on Linux RHEL 9

Commit ID: 59f0b11
Duration 2:21:14
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

foundationdb-ci · 2025-06-03T18:08:45Z

Result of foundationdb-pr-clang on Linux RHEL 9

Commit ID: 59f0b11
Duration 2:22:31
Result: ✅ SUCCEEDED
Error: N/A
Build Log terminal output (available for 30 days)
Build Workspace zip file of the working directory (available for 30 days)

kakaiu · 2025-06-03T18:36:43Z

design/replication-policy.md

+The commit proxy uses the shard map to derive which log servers to send committed data to. Each SS has a "buddy" with a single corresponding log server. The 
+buddy is found by taking the SS id modded by the number of tLogs in the dc.
+
+When a mutation is committed, the key(s) it modifies will logically reside within one or more shards, which physically are replicated over the SS in their team. Each such SS has a buddy 


For 'more shards', clearRange is the only case. Am I understanding correctly? Thanks!

This paragraph should be modified so it talks about about a "batch" of mutations. For each mutation in the batch we find the set of replicas and then we take a union of all the sets to get the final list of destination servers. For a single mutation I believe you are right.

kakaiu · 2025-06-03T18:40:40Z

design/replication-policy.md

+- Shards move between SS as data is distributed.
+- New shards can be created.
+
+This means the destination team must be recalculated on each commit.


To be specific, the "dest team" here means the SS team that the mutation goes to at a version? If yes, the "recalculation" means that given a key/range, find the source server and dest server (if data move) at the version? This is achieved by looking up the in-memory shard map at the commit proxy.

Here, I just mean FDB has to calculate a set of destination log servers on every commit. Every commit will contain a batch of mutations with keys written, and we need to enumerate them to get the modified keys. I'll clarify the sentence.

kakaiu · 2025-06-03T18:42:08Z

design/replication-policy.md

+- There are code paths where the shard map is not consulted, for example, when a tag is added to a transaction as a consequence of a metadata mutation.
+- Two SS may have the same buddy, in which case a random server must be chosen.
+
+Though users interact with only a few predefined redundancy modes (double, triple, etc.), FoundationDB internally supports a **rich set of compositional 


Nice to see this description! For better understanding the policy, can you explain what will happens to Across(3, zone_id, One()).selectRelicas() when the input alsoServers contains two tLog IDs but the two IDs are of the same zone. Is this case possible? If yes, what will the selectRelicas do to meet the policy? Thanks!

In this case the "also" servers would not meet the policy. The code would have to search the set of "available" servers to find a combination that works. If more than one server from the available list meets the requirements, a random one from that set would be selected.

This description of policies is super useful!

sbodagala · 2025-06-04T19:26:55Z

design/replication-policy.md

+
+TLogs: **0** 1 **2** **3**
+
+The "buddy" associations between the SS and LS are dynamic.


I don't think this statement is completely true: the buddy association between a storage server and a log server is static, as it is decided by the mod function; the association between a shard and a log server is dynamic though.

sbodagala · 2025-06-04T19:38:01Z

design/replication-policy.md

+replication policy. If they do not, the function will select additional servers at random out of the available servers in order to meet the policy. Because the 
+shard map has replicas for all key ranges,the set of "also" servers passed to selectReplica usually already satisfies the policy, but there are exceptions.
+
+- There are code paths where the shard map is not consulted, for example, when a tag is added to a transaction as a consequence of a metadata mutation.


This is so I fully understand, what type of metadata mutation are you referring to here? Thanks!

Look for the pattern addTag() followed by writeMutation in ApplyMetadataMutation.cpp, such as checkSetServerKeysPrefix() checkSetServerTagsPrefix(), checkSetChangeFeedPrefix(), etc. In those cases we obtain a tag without consulting the shard map, so do not have replicas.

Add replication policy doc

59f0b11

dlambrig requested review from jzhou77 and sbodagala June 3, 2025 15:45

kakaiu self-requested a review June 3, 2025 18:33

kakaiu reviewed Jun 3, 2025

View reviewed changes

sbodagala reviewed Jun 4, 2025

View reviewed changes


		TLogs: 0 1 2 3

		The "buddy" associations between the SS and LS are dynamic.

Add replication policy doc #12179

Are you sure you want to change the base?

Add replication policy doc #12179

Uh oh!

Conversation

dlambrig commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code-Reviewer Section

For Release-Branches

Uh oh!

foundationdb-ci commented Jun 3, 2025

Result of foundationdb-pr-clang-ide on Linux RHEL 9

Uh oh!

foundationdb-ci commented Jun 3, 2025

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

Uh oh!

foundationdb-ci commented Jun 3, 2025

Result of foundationdb-pr-clang-arm on Linux CentOS 7

Uh oh!

foundationdb-ci commented Jun 3, 2025

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

Uh oh!

foundationdb-ci commented Jun 3, 2025

Result of foundationdb-pr-macos on macOS Ventura 13.x

Uh oh!

foundationdb-ci commented Jun 3, 2025

Result of foundationdb-pr on Linux RHEL 9

Uh oh!

foundationdb-ci commented Jun 3, 2025

Result of foundationdb-pr-clang on Linux RHEL 9

Uh oh!

kakaiu Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

dlambrig Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kakaiu Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dlambrig Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kakaiu Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dlambrig Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

jzhou77 Jun 10, 2025

Choose a reason for hiding this comment

Uh oh!

sbodagala Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

sbodagala Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

dlambrig Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dlambrig commented Jun 3, 2025 •

edited

Loading

dlambrig Jun 3, 2025 •

edited

Loading

kakaiu Jun 3, 2025 •

edited

Loading

dlambrig Jun 3, 2025 •

edited

Loading

kakaiu Jun 3, 2025 •

edited

Loading