Skip to content

Add replication policy doc #12179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Add replication policy doc #12179

wants to merge 1 commit into from

Conversation

dlambrig
Copy link
Contributor

@dlambrig dlambrig commented Jun 3, 2025

Describe the replication policy.

When reviewing in github, click on "rich diff" to see the markup format rendered.

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

@dlambrig dlambrig requested review from jzhou77 and sbodagala June 3, 2025 15:45
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-ide on Linux RHEL 9

  • Commit ID: 59f0b11
  • Duration 0:24:18
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 59f0b11
  • Duration 0:36:48
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 59f0b11
  • Duration 0:49:27
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 59f0b11
  • Duration 0:58:09
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 59f0b11
  • Duration 0:59:46
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 59f0b11
  • Duration 2:21:14
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 59f0b11
  • Duration 2:22:31
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@kakaiu kakaiu self-requested a review June 3, 2025 18:33
The commit proxy uses the shard map to derive which log servers to send committed data to. Each SS has a "buddy" with a single corresponding log server. The
buddy is found by taking the SS id modded by the number of tLogs in the dc.

When a mutation is committed, the key(s) it modifies will logically reside within one or more shards, which physically are replicated over the SS in their team. Each such SS has a buddy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 'more shards', clearRange is the only case. Am I understanding correctly? Thanks!

Copy link
Contributor Author

@dlambrig dlambrig Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph should be modified so it talks about about a "batch" of mutations. For each mutation in the batch we find the set of replicas and then we take a union of all the sets to get the final list of destination servers. For a single mutation I believe you are right.

- Shards move between SS as data is distributed.
- New shards can be created.

This means the destination team must be recalculated on each commit.
Copy link
Member

@kakaiu kakaiu Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be specific, the "dest team" here means the SS team that the mutation goes to at a version? If yes, the "recalculation" means that given a key/range, find the source server and dest server (if data move) at the version? This is achieved by looking up the in-memory shard map at the commit proxy.

Copy link
Contributor Author

@dlambrig dlambrig Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I just mean FDB has to calculate a set of destination log servers on every commit. Every commit will contain a batch of mutations with keys written, and we need to enumerate them to get the modified keys. I'll clarify the sentence.

- There are code paths where the shard map is not consulted, for example, when a tag is added to a transaction as a consequence of a metadata mutation.
- Two SS may have the same buddy, in which case a random server must be chosen.

Though users interact with only a few predefined redundancy modes (double, triple, etc.), FoundationDB internally supports a **rich set of compositional
Copy link
Member

@kakaiu kakaiu Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see this description! For better understanding the policy, can you explain what will happens to Across(3, zone_id, One()).selectRelicas() when the input alsoServers contains two tLog IDs but the two IDs are of the same zone. Is this case possible? If yes, what will the selectRelicas do to meet the policy? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the "also" servers would not meet the policy. The code would have to search the set of "available" servers to find a combination that works. If more than one server from the available list meets the requirements, a random one from that set would be selected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description of policies is super useful!


TLogs: **0** 1 **2** **3**

The "buddy" associations between the SS and LS are dynamic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this statement is completely true: the buddy association between a storage server and a log server is static, as it is decided by the mod function; the association between a shard and a log server is dynamic though.

replication policy. If they do not, the function will select additional servers at random out of the available servers in order to meet the policy. Because the
shard map has replicas for all key ranges,the set of "also" servers passed to selectReplica usually already satisfies the policy, but there are exceptions.

- There are code paths where the shard map is not consulted, for example, when a tag is added to a transaction as a consequence of a metadata mutation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so I fully understand, what type of metadata mutation are you referring to here? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look for the pattern addTag() followed by writeMutation in ApplyMetadataMutation.cpp, such as checkSetServerKeysPrefix() checkSetServerTagsPrefix(), checkSetChangeFeedPrefix(), etc. In those cases we obtain a tag without consulting the shard map, so do not have replicas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants