Skip to content

ODD PI 26.? Objective ?: WASM chunk codec research and PoC #331

@jkeifer

Description

@jkeifer

Motivation

The vision behind this work is perhaps summarized with the goal of making data processing work like the web: portable, sandboxed, and trustless.

This PoC aims to prove that chunk decoding can be unified across chunked data format using a codec VM architecture. Just as web browsers execute untrusted code safely via sandboxing, datasets will be able to specify their own decoders (including novel or proprietary codecs) that consumers can execute without installation, compatibility concerns, or trust barriers. This foundational shift enables:

  • Cross-format tooling that works universally: as Zarr has worked to unify array formats, we can go a step further and support formats beyond arrays
  • Frictionless sharing of experimental/proprietary encodings: as data volumes exponentially increase, ensuring we can design and seamlessly use more optimal compression techniques becomes an imperative
  • The prerequisite infrastructure for format-agnostic access protocols like CCRP

Description

For this initial PoC we aim to build a proof-of-concept codec virtual machine (VM) that can decode chunks from multiple data formats using a unified interface. Specifically, we have identified the following deliverables:

  • Inventory of common codecs across common formats (Zarr, COG, TIFF, HDF5, Parquet)
  • Basic open source codec runner with hybrid execution model (native + WebAssembly VM)
  • Tests showing successful decoding of real array chunks
  • A definition of the minimal chunk metadata content required to drive this initial codec executor
  • An ADR-style document detailing the vision for this new codec ecosystem

Acceptance Criteria

We have each of the above deliverables and are ready to start engaging the rest of the ODD team and other community stakeholders in the discussion around the future of this work.

Sub-tasks

  • Create codec inventory
  • Create WASM codec implementation(s)
  • Create WASM codec runner with tests
  • Create demo(s) showing how this all works
  • Write ADR outlining the ecosystem vision

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions