docs: add a section on high availability and disaster recovery #1089

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

davidrichards-da wants to merge 4 commits into main from docs-add-a-section-on-high-availability-and-disaster-recovery

Contributor

davidrichards-da commented Jan 6, 2026

No description provided.


          docs: add a section on high availability and disaster recovery

41432d0

Signed-off-by: davidrichards-da <89472028+davidrichards-da@users.noreply.github.com>

davidrichards-da requested review from a team as code owners

January 6, 2026 16:10


          Fixing a title overline too short error

dcc2da8

Signed-off-by: davidrichards-da <89472028+davidrichards-da@users.noreply.github.com>

PHOL-DA reviewed

View reviewed changes

Contributor

PHOL-DA left a comment

Im hesitant about adding "HA and DR" to wallet integration section, this seems more like something that should be in the operate sections (and we have most of the cases covered in the exchange integration already, but under different naming)

docs/wallet-integration-guide/src/high-availability-and-disaster-recovery/index.rst Outdated

+              Recommended Architecture
+              ~~~~~~~~~~~~~~~~~~~~~~~~
+              * **Redundant Validators**: Run 2 validators behind a single gateway.

Contributor

PHOL-DA Jan 7, 2026

do you mean a wallet gateway or a load balancer ?

Contributor Author

davidrichards-da Jan 7, 2026

load balancer. I'll change

docs/wallet-integration-guide/src/high-availability-and-disaster-recovery/index.rst

+              * **Redundant Validators**: Run 2 validators behind a single gateway.
+              * **Confirming Rights**: Host parties on both validators with confirming rights.
+              * **Threshold Configuration**: Implement a confirming threshold of 1/2. This ensures that if one validator goes offline, the remaining node can still authorize transactions.

Contributor

PHOL-DA Jan 7, 2026

could you add a code snippet showcasing how to do this using the Wallet SDK ?

Contributor Author

davidrichards-da Jan 7, 2026

I can't. From what I can see we don't set a confirming threshold in our multi-hosting parties example either.

docs/wallet-integration-guide/src/high-availability-and-disaster-recovery/index.rst

+              * **Node Scaling**: Host more validators with a low confirming threshold.
+              ----------------------
+              Disaster Recovery (DR)

Contributor

PHOL-DA Jan 7, 2026

would it not make more sense to simply link to:
https://docs.dev.sync.global/validator_operator/validator_disaster_recovery.html

you seem to be linking to it in all three subsections anyway.

Contributor Author

davidrichards-da Jan 7, 2026

Yes, but that doesn't put it into perspective as to what can be recovered

docs/wallet-integration-guide/src/high-availability-and-disaster-recovery/index.rst Outdated

		@@ -0,0 +1,73 @@
		=======================================
		High Availability and Disaster Recovery

Contributor

PHOL-DA Jan 7, 2026

You could make a strong argument that this does not fit well within the Integrate sections and should rather be in the Operate sections of the docs.

Contributor Author

davidrichards-da Jan 7, 2026

Could, but as discussed in the meeting, won't

davidrichards-da added 2 commits

January 7, 2026 11:27


          Replacing high-availability with resiliance

5277fff

Signed-off-by: davidrichards-da <89472028+davidrichards-da@users.noreply.github.com>


          replacing gateway with load balancer

cfad9d6

Signed-off-by: davidrichards-da <89472028+davidrichards-da@users.noreply.github.com>

bame-da reviewed

View reviewed changes

docs/wallet-integration-guide/src/high-availability-and-disaster-recovery/index.rst

+              Recommended Architecture
+              ~~~~~~~~~~~~~~~~~~~~~~~~
+              * **Redundant Validators**: Run 2 validators behind a load balancer.

bame-da Jan 7, 2026

Doesn't work like this thanks to node-local offsets. Failover needs some client side handling.

bame-da reviewed

View reviewed changes

docs/wallet-integration-guide/src/high-availability-and-disaster-recovery/index.rst

+              ~~~~~~~~~~~~~~~~~~~~~~~~
+              * **Redundant Validators**: Run 2 validators behind a load balancer.
+              * **Confirming Rights**: Host parties on both validators with confirming rights.

bame-da Jan 7, 2026

This probably refers to the hosted parties, but there's also the matter of the provider party for CC preapprovals. Unfortunately the renewal automation only works if the preapproval provider is a node admin party. And if the node admin party is down, the preapproval doesn't work anymore so incoming transfers time out.

So the node admin party of one node has to be replicated to another node in confirming mode, which is a thus far undocumented procedure.

bame-da reviewed

View reviewed changes

docs/wallet-integration-guide/src/high-availability-and-disaster-recovery/index.rst


		Disaster Recovery is the process of recovering from a scenario where a validator is lost completely and its immediate failovers are unavailable.

		Backup Best Practices

bame-da Jan 7, 2026

Isn't this a replica of https://docs.digitalasset.com/integrate/devnet/exchange-integration/disaster-recovery.html?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet