Skip to content

Make data shard downloads more robust #600

@joellidin

Description

@joellidin

Description

The current implementation of shard downloads is not fully reliable. In some cases, downloads fail or stall, and retries do not always resolve the issue. Since shards are swapped only occasionally, one possible alternative would be to directly await the download of a new shard before proceeding, instead of relying on the current background/preparation logic.

Tasks

  • Investigate current shard download logic and failure modes.
  • Improve robustness of shard downloads:
    • Add stronger retry and validation handling, or
    • Switch to an approach where the system awaits the shard download directly when needed.
  • Ensure that failures are logged clearly and that fallback behavior (e.g., retry, skip, or fail fast) is consistent.

Acceptance Criteria:

  • Shard downloads succeed reliably without leaving the system in a broken state.
  • If a shard cannot be downloaded within a reasonable timeframe, the system fails gracefully with clear logs.
  • Optionally support an await-based direct download mode for shards, given the low frequency of shard swaps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions