Skip to content

☂️ Two extension points: extendStore and extendArray #394

@manzt

Description

@manzt

Umbrella for the plan and stacked PRs that close out #349.

#349 asked for a blessed way to compose stores. Surveying the real use cases, zarrita actually needs two composition points, not one, and the current AsyncReadable<Options> generic is a symptom of only having the wrong half.

This is the plan:

  • zarr.extendStore for the transport layer (bytes in, bytes out)
  • zarr.extendArray for the data layer (coords in, chunks out)
import * as zarr from "zarrita";

// Transport layer: wrap a store with caching, batching, consolidated metadata.
let store = await zarr.extendStore(
  new zarr.FetchStore("https://example.com/data.zarr"),
  (s) => zarr.withConsolidation(s),
  (s) => zarr.withRangeBatching(s, { cacheSize: 512 }),
);

// Data layer: wrap an array with chunk caching, prefetch, observability.
let arr = await zarr.extendArray(
  await zarr.open(store, { kind: "array" }),
  (a) => withChunkCache(a, { cache: new Map() }),
);

await zarr.get(arr, [null, zarr.slice(0, 10)]);

Both extension points are built from the same factory primitive:

const withChunkCache = zarr.defineArrayMiddleware(
  (array, opts: { cache: Map<string, zarr.Chunk<zarr.DataType>> }) => ({
    async getChunk(coords, options) {
      let key = coords.join(",");
      let hit = opts.cache.get(key);
      if (hit) return hit;
      let chunk = await array.getChunk(coords, options);
      opts.cache.set(key, chunk);
      return chunk;
    },
  }),
);

Why two layers

Every custom store, wrapper, cache, and hack in the zarrita ecosystem falls into one of two buckets.

Transport concerns operate on (key, range) -> Uint8Array and don't care about zarr's logical model:

  • Auth, presigning, request transformation. Covered at the call site by the
    custom fetch option from Add custom fetch option to FetchStore #388.
  • Status code remapping (e.g. S3 403 -> 404 on private buckets).
  • Range batching and coalescing.
  • Consolidated metadata short-circuiting.
  • Byte caching, e.g. vizarr's lru(store) wrapper.

Data concerns operate on (chunkCoords) -> Chunk<T> and don't care about paths or bytes:

Before this refactor, zarrita only had a transport extension point, and even that was unofficial (subclass FetchStore or hand-roll an AsyncReadable). The data layer had no extension point at all, so chunk-level concerns were smuggled through AsyncReadable<Options>, a generic that threaded opaque per-call state from the call site all the way down to the store.

Once the data layer has its own extension point, the Options generic has no job. It goes away, and signal (the one thing anyone ever actually threaded) becomes a plain field:

// Before
interface AsyncReadable<Options = unknown> {
  get(key: AbsolutePath, opts?: Options): Promise<Uint8Array | undefined>;
}
await zarr.get(arr, null, { opts: { signal: ctl.signal } });

// After
interface AsyncReadable {
  get(key: AbsolutePath, opts?: { signal?: AbortSignal }): Promise<Uint8Array | undefined>;
}
await zarr.get(arr, null, { signal: ctl.signal });

Stacked PRs

What this unblocks

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions