-
Notifications
You must be signed in to change notification settings - Fork 12
docs: add a section on high availability and disaster recovery #1089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs: add a section on high availability and disaster recovery #1089
Conversation
Signed-off-by: davidrichards-da <89472028+davidrichards-da@users.noreply.github.com>
Signed-off-by: davidrichards-da <89472028+davidrichards-da@users.noreply.github.com>
PHOL-DA
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im hesitant about adding "HA and DR" to wallet integration section, this seems more like something that should be in the operate sections (and we have most of the cases covered in the exchange integration already, but under different naming)
| Recommended Architecture | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| * **Redundant Validators**: Run 2 validators behind a single gateway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean a wallet gateway or a load balancer ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load balancer. I'll change
|
|
||
| * **Redundant Validators**: Run 2 validators behind a single gateway. | ||
| * **Confirming Rights**: Host parties on both validators with confirming rights. | ||
| * **Threshold Configuration**: Implement a confirming threshold of 1/2. This ensures that if one validator goes offline, the remaining node can still authorize transactions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a code snippet showcasing how to do this using the Wallet SDK ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't. From what I can see we don't set a confirming threshold in our multi-hosting parties example either.
| * **Node Scaling**: Host more validators with a low confirming threshold. | ||
|
|
||
| ---------------------- | ||
| Disaster Recovery (DR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it not make more sense to simply link to:
https://docs.dev.sync.global/validator_operator/validator_disaster_recovery.html
you seem to be linking to it in all three subsections anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but that doesn't put it into perspective as to what can be recovered
| @@ -0,0 +1,73 @@ | |||
| ======================================= | |||
| High Availability and Disaster Recovery | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could make a strong argument that this does not fit well within the Integrate sections and should rather be in the Operate sections of the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could, but as discussed in the meeting, won't
Signed-off-by: davidrichards-da <89472028+davidrichards-da@users.noreply.github.com>
Signed-off-by: davidrichards-da <89472028+davidrichards-da@users.noreply.github.com>
| Recommended Architecture | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| * **Redundant Validators**: Run 2 validators behind a load balancer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't work like this thanks to node-local offsets. Failover needs some client side handling.
| ~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| * **Redundant Validators**: Run 2 validators behind a load balancer. | ||
| * **Confirming Rights**: Host parties on both validators with confirming rights. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably refers to the hosted parties, but there's also the matter of the provider party for CC preapprovals. Unfortunately the renewal automation only works if the preapproval provider is a node admin party. And if the node admin party is down, the preapproval doesn't work anymore so incoming transfers time out.
So the node admin party of one node has to be replicated to another node in confirming mode, which is a thus far undocumented procedure.
|
|
||
| Disaster Recovery is the process of recovering from a scenario where a validator is lost completely and its immediate failovers are unavailable. | ||
|
|
||
| Backup Best Practices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a replica of https://docs.digitalasset.com/integrate/devnet/exchange-integration/disaster-recovery.html?
No description provided.