Motivation
The vision behind this work is perhaps summarized with the goal of making data processing work like the web: portable, sandboxed, and trustless.
This PoC aims to prove that chunk decoding can be unified across chunked data format using a codec VM architecture. Just as web browsers execute untrusted code safely via sandboxing, datasets will be able to specify their own decoders (including novel or proprietary codecs) that consumers can execute without installation, compatibility concerns, or trust barriers. This foundational shift enables:
- Cross-format tooling that works universally: as Zarr has worked to unify array formats, we can go a step further and support formats beyond arrays
- Frictionless sharing of experimental/proprietary encodings: as data volumes exponentially increase, ensuring we can design and seamlessly use more optimal compression techniques becomes an imperative
- The prerequisite infrastructure for format-agnostic access protocols like CCRP
Description
For this initial PoC we aim to build a proof-of-concept codec virtual machine (VM) that can decode chunks from multiple data formats using a unified interface. Specifically, we have identified the following deliverables:
- Inventory of common codecs across common formats (Zarr, COG, TIFF, HDF5, Parquet)
- Basic open source codec runner with hybrid execution model (native + WebAssembly VM)
- Tests showing successful decoding of real array chunks
- A definition of the minimal chunk metadata content required to drive this initial codec executor
- An ADR-style document detailing the vision for this new codec ecosystem
Acceptance Criteria
We have each of the above deliverables and are ready to start engaging the rest of the ODD team and other community stakeholders in the discussion around the future of this work.
Sub-tasks
Motivation
The vision behind this work is perhaps summarized with the goal of making data processing work like the web: portable, sandboxed, and trustless.
This PoC aims to prove that chunk decoding can be unified across chunked data format using a codec VM architecture. Just as web browsers execute untrusted code safely via sandboxing, datasets will be able to specify their own decoders (including novel or proprietary codecs) that consumers can execute without installation, compatibility concerns, or trust barriers. This foundational shift enables:
Description
For this initial PoC we aim to build a proof-of-concept codec virtual machine (VM) that can decode chunks from multiple data formats using a unified interface. Specifically, we have identified the following deliverables:
Acceptance Criteria
We have each of the above deliverables and are ready to start engaging the rest of the ODD team and other community stakeholders in the discussion around the future of this work.
Sub-tasks