-
Notifications
You must be signed in to change notification settings - Fork 3
Improved codex/raw/codex-block-exchange.md file #215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| including service interface, peer discovery, and network protocol interactions. | ||
|
|
||
| #### Complete Block Request Flow | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diagram shows WantBlock sent to multiple peers in parallel. While this is valid behaviour, it has significant implications. Two strategies exist:
Strategy 1: Parallel WantBlock (shown in diagram)
- Pro: Lower latency (first responder wins)
- Pro: Redundancy - if one peer fails/is slow, others may succeed faster
- Con: Bandwidth waste - multiple peers send full block (potentially 100 MiB each), only first arrival used
- Con: Network congestion - multiple large transfers compete for bandwidth simultaneously
- Con: Resource consumption - receiver must buffer/process multiple blocks in parallel
- Con: Unfair to providers - peers waste bandwidth on cancelled transfers
- Con: Complexity - cancellation logic, race conditions, and handling partial transfers
Strategy 2: Two-phase discovery (currently used in Codex)
- Phase 1 (Discovery): Send
WantHaveto multiple peers, collectBlockPresenceresponses, selectONEpeer using scoring algorithm - Phase 2 (Delivery): Send
WantBlockonly to selected peer - Pro: Bandwidth efficient - only one block transfer
- Pro: Better peer selection - can evaluate price, latency, reputation before committing
- Pro: Predictable resource usage - single transfer, no cancellation overhead
- Pro: Fairer to providers - no wasted bandwidth
- Con: Additional latency - extra discovery round-trip before delivery
- Con: Single point of failure - if selected peer fails, requires retry with new discovery
Could we include both diagrams in the RFC? This would present both options and their tradeoffs, allowing to choose the strategy that best fits specific scenarios (e.g., latency-critical vs bandwidth-constrained environments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cnanakos - added the diagrams, let me know if they need to be altered.
| - If all peers fail, propagate error to caller | ||
| - Maintain request queue for later retry | ||
| - Clean up resources (memory, pending requests) on unrecoverable failures | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, two different strategies are proposed here at the same time, which creates a contradiction:
- If all peers fail, propagate error to caller
- Maintain request queue for later retry
- Fail fast: If all peers fail, we immediately propagate the error to the caller, and the caller decides whether to retry.
- Queued retry: If all peers fail, we keep the request in a pending queue and retry automatically, only returning an error after timeout/max retries.
These behaviours are orthogonal - one assumes the caller owns retry, the other assumes the system owns retry. I believe we need more clarity here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified this part a bit - let me know if it works.
Added two strategy diagrams with trade-offs: - Parallel Request: low latency, bandwidth-wasteful - Two-Phase Discovery: bandwidth-efficient, higher latency - Hybrid Approach flowchart for strategy selection
Added clear separation between: - System-level retry: automatic peer rotation, transparent to caller - Caller-level retry: when all peers exhausted, caller decides Addresses reviewer comment about fail-fast vs queued-retry contradiction.
b12d10c to
02fbb64
Compare
fbarbu15
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
cnanakos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Improved codex-block-exchange.md file in codex/raw folder