Distribute Gather Across All Ranks

# Distribute Gather Across Ranks & Make `gather_result` Consumers Partial-Aware

## Summary

Implement distributed gather across all ranks and update every consumer of `gather_result` to correctly handle partial results as well as a merged aggregate. Maintain current output schema and downstream semantics.

## Objectives

* Distribute peer fetching and validation evenly across ranks using deterministic partitioning without duplicates.
* Produce per-rank partial results and a canonical merged aggregate for global metrics and artifact emission.
* Enable outer update to process partial results incrementally under a bounded memory budget.
* Ensure all consumer paths operate correctly with either partials or an aggregate.

## Scope

* Gather execution: partitioning, per-rank fetch/validate, partial outputs, merge/reduce, artifact emission on a single rank.
* Consumers: outer update, index-overlap checks, per-param norms, quality metrics (intended vs actual, success rate, skipped), logging.
* Dedupe policy for repeated UIDs across partitions or retries.
* Synchronization and small control flags for readiness and skip decisions.

## Deliverables

1. **Partitioning**

   * Deterministic mapping from peer list to ranks.
   * Guardrails preventing duplicate downloads when using reserves or retries.

2. **Per-Rank Partial Gather**

   * Fetch and validate assigned peers.
   * Emit partial `gather_result` with `uids`, `skipped_uids`, `success_rate_part`, and per-param payloads.
   * Compute lightweight per-rank index-overlap signatures.

3. **Merge & Global Metrics**

   * All-gather of partials and merge on one rank.
   * Compute global `uids`, `skipped_uids`, success rate, intended vs actual, and overlap candidates.
   * Emit/upload the canonical aggregate artifact from a single rank.

4. **Partial-Aware Consumers**

   * Outer update accepts a sequence of partial results and applies them incrementally in a deterministic order with a configurable memory budget.
   * Index-overlap detection reduces per-rank signatures to global findings.
   * Norms and quality metrics reduce across partials and match single-rank semantics.

5. **Synchronization & Control**

   * Barriers at gather completion and pre-update.
   * Minimal readiness/skip flags broadcast to all ranks.

## Risks

* Duplicate work without strict partitioning and reserve handling.
* Metric drift if reductions do not union sets consistently.
* Double application if the same UID appears in multiple partials.
* Memory pressure during concurrent decompress/validate without chunking.
* Non-deterministic application order causing minor numerical divergence.

## Acceptance Criteria

* Gather+validate wall-clock time decreases with the number of ranks until network limits dominate.
* Peak memory during outer update remains within the configured budget and does not scale with peer count.
* For the same peer set, distributed flow produces outputs equivalent to the single-rank baseline for:

  * `uids`, `skipped_uids`, success rate, intended vs actual
  * Per-param norms and index-overlap results
  * Applied model update (within expected floating-point tolerance)
* No duplicate downloads or double applications; reserve backfills behave correctly.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Distribute Gather Across All Ranks #598

Distribute Gather Across Ranks & Make `gather_result` Consumers Partial-Aware

Summary

Objectives

Scope

Deliverables

Risks

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Distribute Gather Across All Ranks #598

Description

Distribute Gather Across Ranks & Make gather_result Consumers Partial-Aware

Summary

Objectives

Scope

Deliverables

Risks

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Distribute Gather Across Ranks & Make `gather_result` Consumers Partial-Aware