Description
Is your feature request related to a problem? Please describe.
We're starting to accumulate quite a bit of "oral knowledge" on how the O selection mechanism works, the overall reliability strategy, and the justifications for that strategy. It would be good to document those.
There are two possible audiences here that the documentation should be addressed towards: developers working on the goclient, and users of the goclient (B/O/T operators).
This probably means separate sections or separate pages; it would be good to clearly identify the target audience when writing.
Describe the solution you'd like
A docs/reliability.md
page describing our accumulated knowledge and strategy.
Describe alternatives you've considered
Separate pages, one for developers and another for users. Whichever is more appropriate probably depends on the depth of each.
It may be enough to begin with a high-level description suitable for users, then drill down into details suitable for developers.
Additional context
Some things to describe, off the top of my head:
- Reliability and load control: round robin, retries
- Backpressure: One segment in flight
- Load balancing:
-maxSessions
- BroadcasterManager list refresh intervals Adding broadcast session manager methods to process segment #806 (comment)
- Front running prevention: External storage prefix for multi-O
- Justification for each of these mechanisms, the problem(s) they solve and how
(this list is not exhaustive)