Skip to content

Proposal: High-performance S3 range reader for NWB/LINDI (coalesced + parallel reads) #118

@davidparks21

Description

@davidparks21

I’m opening this issue to discuss and track work on a higher-performance remote reader for NWB/LINDI over S3-compatible storage (Ceph/S3 in our case).

This follows discussions with Ben Dichter and Ryan Ly.

I’m interested in implementing and benchmarking this in lindi. I’ve built similar high-performance readers before (different format, similar constraints), e.g. in this project: https://www.nature.com/articles/s41593-024-01715-2. I'm opening this issue to track the work and elicit feedback.

Use case

For ML workloads, we repeatedly request time windows from very large ephys files (e.g., ~7 TB NWB, 1024 channels). Current HDF5-style access patterns can trigger many small underlying reads, and on our Ceph/S3 system each range request has high fixed latency (~0.3 s), so request count dominates runtime.

Goal

For a single logical request (time window over one or more channels), reduce S3 range GETs from “many/small” to:

  • ~1 request when required bytes are contiguous
  • a small number of requests when bytes are split into a few contiguous runs
  • parallel requests when multiple discontiguous channel runs are required

Requirements

  • Minimize number of underlying S3 range requests per query.
  • Coalesce adjacent/near-adjacent byte ranges when beneficial.
  • Parallelize independent range reads (e.g., discontiguous channel runs).
  • Keep correctness identical to current reader behavior.

Assumptions

  • Dataset chunk/layout is configured to support efficient sequential access for target query patterns.
  • Some multi-channel requests may be discontiguous in file layout; these should be parallelized rather than forced
    into large over-reads.
  • For .lindi.json generation, we may need explicit per-chunk refs (i.e., avoid _EXTERNAL_ARRAY_LINK for target
    datasets).

Non-goals

  • Optimizing arbitrary highly-strided access patterns that require excessive over-read.
  • Changing NWB schema semantics.

Proposed deliverables

  1. Range planner: map a dataset selection to exact byte ranges.
  2. Coalescing policy: merge ranges using configurable max_gap_bytes / max_overread_bytes.
  3. Parallel fetch executor with bounded concurrency.
  4. Benchmarks + instrumentation:
    • request count
    • total bytes fetched
    • wall time / throughput
  5. Regression tests for correctness and performance-sensitive paths.

Acceptance criteria (initial)

  • Demonstrate substantial reduction in S3 request count for representative ephys window reads.
  • Show end-to-end latency improvement on remote object storage.
  • Preserve exact returned data vs baseline reader.

Comments on API placement are welcome:

  • extend LindiRemfile, or
  • add a new specialized high-performance reader path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions