Skip to content

Conversation

@Cofson
Copy link
Contributor

@Cofson Cofson commented Nov 21, 2025

Improved codex-block-exchange.md file in codex/raw folder

including service interface, peer discovery, and network protocol interactions.

#### Complete Block Request Flow

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diagram shows WantBlock sent to multiple peers in parallel. While this is valid behaviour, it has significant implications. Two strategies exist:

Strategy 1: Parallel WantBlock (shown in diagram)

  • Pro: Lower latency (first responder wins)
  • Pro: Redundancy - if one peer fails/is slow, others may succeed faster
  • Con: Bandwidth waste - multiple peers send full block (potentially 100 MiB each), only first arrival used
  • Con: Network congestion - multiple large transfers compete for bandwidth simultaneously
  • Con: Resource consumption - receiver must buffer/process multiple blocks in parallel
  • Con: Unfair to providers - peers waste bandwidth on cancelled transfers
  • Con: Complexity - cancellation logic, race conditions, and handling partial transfers

Strategy 2: Two-phase discovery (currently used in Codex)

  • Phase 1 (Discovery): Send WantHave to multiple peers, collect BlockPresence responses, select ONE peer using scoring algorithm
  • Phase 2 (Delivery): Send WantBlock only to selected peer
  • Pro: Bandwidth efficient - only one block transfer
  • Pro: Better peer selection - can evaluate price, latency, reputation before committing
  • Pro: Predictable resource usage - single transfer, no cancellation overhead
  • Pro: Fairer to providers - no wasted bandwidth
  • Con: Additional latency - extra discovery round-trip before delivery
  • Con: Single point of failure - if selected peer fails, requires retry with new discovery

Could we include both diagrams in the RFC? This would present both options and their tradeoffs, allowing to choose the strategy that best fits specific scenarios (e.g., latency-critical vs bandwidth-constrained environments).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cnanakos - added the diagrams, let me know if they need to be altered.

- If all peers fail, propagate error to caller
- Maintain request queue for later retry
- Clean up resources (memory, pending requests) on unrecoverable failures

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, two different strategies are proposed here at the same time, which creates a contradiction:

  • If all peers fail, propagate error to caller
  • Maintain request queue for later retry
  1. Fail fast: If all peers fail, we immediately propagate the error to the caller, and the caller decides whether to retry.
  2. Queued retry: If all peers fail, we keep the request in a pending queue and retry automatically, only returning an error after timeout/max retries.

These behaviours are orthogonal - one assumes the caller owns retry, the other assumes the system owns retry. I believe we need more clarity here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified this part a bit - let me know if it works.

Added two strategy diagrams with trade-offs:
- Parallel Request: low latency, bandwidth-wasteful
- Two-Phase Discovery: bandwidth-efficient, higher latency
- Hybrid Approach flowchart for strategy selection
Added clear separation between:
- System-level retry: automatic peer rotation, transparent to caller
- Caller-level retry: when all peers exhausted, caller decides

Addresses reviewer comment about fail-fast vs queued-retry contradiction.
@Cofson Cofson force-pushed the codex/raw/codex-block-exchange-raw branch from b12d10c to 02fbb64 Compare December 5, 2025 10:19
Copy link

@fbarbu15 fbarbu15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@Cofson Cofson requested review from cnanakos and jimstir December 10, 2025 10:00
Copy link

@cnanakos cnanakos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Cofson Cofson merged commit b2f3564 into main Dec 12, 2025
2 checks passed
@Cofson Cofson deleted the codex/raw/codex-block-exchange-raw branch December 12, 2025 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants