Skip to content

Separate backup and restore into two workloads [release-7.4] #12172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 15 commits into
base: release-7.4
Choose a base branch
from

Conversation

jzhou77
Copy link
Contributor

@jzhou77 jzhou77 commented May 29, 2025

cherrypick #12019

500k 20250529-164118-jzhou-f9ad7b51ff17829d

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 26e6cb7
  • Duration 0:50:19
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 26e6cb7
  • Duration 1:01:32
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 26e6cb7
  • Duration 1:16:15
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 26e6cb7
  • Duration 1:16:34
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@@ -755,7 +755,7 @@ For **RHEL/CentOS**, perform the upgrade using the rpm command:
user@host$ sudo rpm -Uvh |package-rpm-clients| \\
|package-rpm-server|

The ``foundationdb-clients`` package also installs the :doc:`C <api-c>` API. If your clients use :doc:`Ruby <api-ruby>`, :doc:`Python <api-python>`, `Java <javadoc/index.html>`_, or `Go <https://godoc.org/github.com/apple/foundationdb/bindings/go/src/fdb>`_, follow the instructions in the corresponding language documentation to install the APIs.
The ``foundationdb-clients`` package also installs the :doc:`C <api-c>` API. If your clients use :doc:`Ruby <api-ruby>`, :doc:`Python <api-python>`, `Java <https://www.javadoc.io/doc/org.foundationdb/fdb-java/latest/index.html>`_, or `Go <https://godoc.org/github.com/apple/foundationdb/bindings/go/src/fdb>`_, follow the instructions in the corresponding language documentation to install the APIs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change supposed to be in here? It is from another PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I build on top of the other PR, will rebase after that one is merged.

Copy link
Contributor

@saintstack saintstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -755,7 +755,7 @@ For **RHEL/CentOS**, perform the upgrade using the rpm command:
user@host$ sudo rpm -Uvh |package-rpm-clients| \\
|package-rpm-server|

The ``foundationdb-clients`` package also installs the :doc:`C <api-c>` API. If your clients use :doc:`Ruby <api-ruby>`, :doc:`Python <api-python>`, `Java <javadoc/index.html>`_, or `Go <https://godoc.org/github.com/apple/foundationdb/bindings/go/src/fdb>`_, follow the instructions in the corresponding language documentation to install the APIs.
The ``foundationdb-clients`` package also installs the :doc:`C <api-c>` API. If your clients use :doc:`Ruby <api-ruby>`, :doc:`Python <api-python>`, `Java <https://www.javadoc.io/doc/org.foundationdb/fdb-java/latest/index.html>`_, or `Go <https://godoc.org/github.com/apple/foundationdb/bindings/go/src/fdb>`_, follow the instructions in the corresponding language documentation to install the APIs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jzhou77 added 15 commits May 29, 2025 17:14
This allows more flexible testing as well as cleaner code.
Otherwise, the Restore workload doesn't have agents, thus can't make progress.

100k partitioned restore tests, i.e., BackupAndRestore.toml and
BackupCorrectnessPartitioned.toml:
  20250311-200411-jzhou-9d34d22d5225d6fe
Instead, infer the flag from backup description.
This allows us to specify old or new style of backup to be used. Added two
tests that switch between them and randomly choose one backup to restore.

20250314-034057-jzhou-c27ca23b6c69cecf
20250314-034541-jzhou-13ed090d0b111474
Because we separate backup and restore into two workloads, they may not choose
the same encryption option, i.e., one encrypted and the other unencrypted.

20250320-013757-jzhou-12b4c8e4504ffd96
20250321-222324-jzhou-fdcd6f145f3ac0f8
This can cause subsequent backup and restore workload to fail.

20250322-040333-jzhou-15c32299d18f4456

100k backup tests:
20250322-040453-jzhou-2bdb4e0ddc265632
If done version by version, it is inefficient and causes the task to be
interrupted in simulation, thus never finishing the RestoreLogDataPartitionedTaskFunc.

20250324-040712-jzhou-cd8501d3890a6b56

100k backup tests:
20250324-040751-jzhou-8cec93182e6d3acb
20250326-182125-jzhou-375d243c097c3b5a
20250325-221139-jzhou-5dac71c4525d414c

100k backup tests:
20250326-162525-jzhou-f08e3fc12887a3e9
Do these checks in the last [[test]] specified in TOML file.
Currently, when submitting backup, backup workers will be enabled for partitioned backups.
However, we didn't clear the backup worker setting if no partitioned backup is active,
which will cause backup workers to be recruited, but doing nothing.

This PR changes the behavior so that when submitting, aborting, or discontinuing
backups, we'll disable backup workers if there is no active partitioned backup jobs.

20250517-162642-jzhou-4966348e89f1794d
20250517-044345-jzhou-f02f7defca3ea010
The continuous log end version could be less than min restorable version, when
the snapshot is a single version. I.e., min and max restorable versions are the
same.

20250519-042009-jzhou-88a2e0c67e8bed92

100k backup tests 20250519-155545-jzhou-c0aeaaf4a933cff9
Also consolidate key updates for pausing backups.

20250520-175022-jzhou-10106ade7e0ad74f

100k backup 20250520-175208-jzhou-15255334e0c57eec
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 58a42d7
  • Duration 0:48:52
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 58a42d7
  • Duration 0:52:14
  • Result: ❌ FAILED
  • Error: Error while executing command: if python3 -m joshua.joshua list --stopped | grep ${ENSEMBLE_ID} | grep -q 'pass=10[0-9][0-9][0-9]'; then echo PASS; else echo FAIL && exit 1; fi. Reason: exit status 1
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 58a42d7
  • Duration 0:57:38
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 58a42d7
  • Duration 1:00:19
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants