feat: vortex row crate by joseph-isaacs · Pull Request #8056 · vortex-data/vortex

joseph-isaacs · 2026-05-22T09:49:07Z

Add an empty vortex-row crate with a minimal initialize stub so the
following commits can layer in the row-encoder, codec, scalar functions,
and per-encoding kernels without touching the workspace skeleton each
time. The crate is wired into the workspace members list and workspace
dependency table; public-api.lock is generated against the stub.

Signed-off-by: Claude noreply@anthropic.com<!--
Thank you for submitting a pull request! We appreciate your time and effort.

Please make sure to provide enough information so that we can review your pull
request. The Summary and Testing sections below contain guidance on what to
include.
-->

Summary

Closes: #000

Testing

Add an empty `vortex-row` crate with a minimal `initialize` stub so the following commits can layer in the row-encoder, codec, scalar functions, and per-encoding kernels without touching the workspace skeleton each time. The crate is wired into the workspace members list and workspace dependency table; `public-api.lock` is generated against the stub. Signed-off-by: Claude <noreply@anthropic.com>

Introduce the per-column sort-field options and the variadic-function options struct used by the upcoming RowSize / RowEncode scalar functions. `RowEncodeOptions::fields` uses a `SmallVec<[SortField; 4]>` so typical 1-4 column keys avoid a heap allocation. Includes a compact serialize / deserialize helper used later by the scalar-function metadata round-trip. Signed-off-by: Claude <noreply@anthropic.com>

Add the byte-encoding kernels for the fixed-width portion of the row encoder: Null, Bool, Primitive (12 PTypes), and Decimal (i8..i128). Each encoder writes a 1-byte sentinel followed by the value's row-comparable bytes (sign-flipped big-endian for signed ints, sign-aware mask for floats, etc.). The size pass is a constant `width-per-row` add for these types; the encode pass walks rows and writes into the shared output buffer at `offsets[i] + cursors[i]`. `row_width_for_dtype` classifies the column based purely on its DType. Scalar-level encoders (`encode_scalar_primitive` / `encode_scalar_bool` / `encode_scalar_null` / `encode_scalar` / `encoded_size_for_scalar`) are included for the same fixed-width subset; varlen and nested canonical variants bail with a clear "not yet supported" error and land in follow-up commits. The implementation is deliberately the simplest correct version: bounds-checked array indexing, no `copy_nonoverlapping`, no validity fast-path helper. Subsequent PRs evolve this toward the optimized form. Signed-off-by: Claude <noreply@anthropic.com>

Extend the codec to handle Utf8/Binary via VarBinView arrays. Each value encodes as a 1-byte sentinel followed by 32-byte chunks: every full chunk has a 0xFF continuation marker; the final partial chunk pads with zeros and writes the partial length (1..=32) as its trailing byte. `encode_varlen_value` uses the simple byte-at-a-time XOR loop here; a faster `copy_nonoverlapping` + stamped continuation version replaces it in PR 2. `encode_varbinview` uses `arr.with_iterator(...)` for both the nullable and non-nullable branches; a direct view walk for the no-nulls branch lands in PR 2 too. `row_width_for_dtype` now returns `Variable` for Utf8/Binary; the size pass and encode dispatchers route through `add_size_varbinview` / `encode_varbinview` correspondingly. The scalar encoder gains `encode_scalar_varlen` and the matching Utf8/Binary arms. Signed-off-by: Claude <noreply@anthropic.com>

Extend the codec to handle Struct, FixedSizeList, and Extension canonical variants. Each nested row encodes as `outer_sentinel | child bytes...`; for null rows the child bytes are zero-filled after the recursive encoders run so two null rows compare equal regardless of which non-null values would have been written by the children. `row_width_for_dtype` recurses through Struct fields and FSL elements to return `Fixed(w)` when every leaf is fixed; otherwise `Variable`. Extension delegates to its storage dtype. List remains `Variable` and ListView still bails (the row encoder's output is itself a ListView, so nested ListView isn't a near-term use case). Variant and Union bail explicitly. Signed-off-by: Claude <noreply@anthropic.com>

Add the size-pass machinery used by both RowSize and the upcoming RowEncode pipeline. `compute_sizes` walks the N input columns once, classifying each via `row_width_for_dtype` and accumulating fixed-width-prefix sums in `fixed_per_row` while pushing per-row sums of variable-length columns into a lazily allocated `var_lengths` vec. The classification result (`ColKind` + `SizePassResult`) is private to the crate; RowEncode consumes it in a later commit to choose between the arithmetic and cursor encode paths. `RowSize` returns a `Struct { fixed: U32, var: U32 }` so callers can read the per-row width without realizing the constant `fixed` slot as a per-row buffer (it's a `ConstantArray`); the `var` slot is a `ConstantArray(0)` when no varlen column is present. `dispatch_size` is the fallback-only path for PR 1 (canonicalize, then codec::field_size). The `RowSizeKernel` trait exists but is unused; per- encoding fast paths and the inventory registry arrive in PR 3. `initialize()` does NOT register RowSize yet - that lands once RowEncode is in place, so the session-registered pair appears together. Signed-off-by: Claude <noreply@anthropic.com>

Add the RowEncode variadic scalar function: encode N input columns into a single ListView<u8> in a five-phase pipeline. Phase 1: size pass via `compute_sizes`. Phase 2: allocate a zero-initialized output buffer sized to fit every row's encoded bytes; bail if the total exceeds u32::MAX. Phase 3: build per-row `listview_offsets`: i * fixed_per_row for the pure-fixed case, or i * fixed_per_row + exclusive cumsum of varlen lengths otherwise. Uses the simple `Vec::push` + `checked_add` loop. Phase 4: walk columns left-to-right and call `dispatch_encode` for every column (cursor path for all). Each call writes its per-row bytes at `offsets[i] + cursors[i]` and advances the cursor. Phase 5: build the ListView<u8> via the validating `try_new` constructor. `dispatch_encode` is the canonicalize-then-`codec::field_encode` fallback; in-crate kernel arms and the inventory registry land in PR 3. The `RowEncodeKernel` trait is defined but unused. PR 2 will iterate on this pipeline (skip zero-init, skip ListView validation, auto- vectorize the offsets loop, etc.). Signed-off-by: Claude <noreply@anthropic.com>

Wire the RowSize/RowEncode scalar functions to the user-facing API: - `convert_columns` accepts a slice of input arrays and per-column SortFields, constructs `RowEncodeOptions` + `VecExecutionArgs`, and returns the encoded `ListViewArray<u8>`. - `compute_row_sizes` returns just the per-row sizes (the `Struct { fixed: u32, var: u32 }` output of `RowSize`). - `initialize()` now registers `RowSize` and `RowEncode` on the given session so they are reachable via the expression layer. Tests cover sort-order round-trips for bool, primitive (i64 asc/desc, u32, f64), utf8, multi-column, nulls_first/last, struct sort-order, the single-buffer invariant of the ListView output, and the structural shape of `RowSize`. Tests that exercise per-encoding fast paths (`constant_path_matches_canonical`, `dict_path_matches_canonical`) land together with their respective kernels in PR 3. The bench file uses divan + mimalloc and reports throughput in GB/s of encoded output bytes for primitive_i64, utf8, and struct_mixed. Each has an `arrow_row` baseline and a `vortex` measurement. Per-encoding fast-path scenarios (constant/dict/patched/bitpacked/for/delta) gain their triplets in PR 3. Baseline measurements at this commit (sample-count=10): primitive_i64_vortex ~1.97 GB/s (vs arrow-row 4.12 GB/s) utf8_vortex ~0.87 GB/s (vs arrow-row 1.56 GB/s) struct_mixed_vortex ~0.95 GB/s (vs arrow-row 1.19 GB/s) PR 2 closes most of the gap by replacing the validating `ListViewArray::try_new` with `new_unchecked`, skipping the buffer zero-init, auto-vectorizing the offsets and varlen-block paths, etc. Signed-off-by: Claude <noreply@anthropic.com>

codspeed-hq · 2026-05-22T09:50:19Z

Merging this PR will improve performance by 19.8%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 1 improved benchmark
✅ 1250 untouched benchmarks

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Simulation	`chunked_varbinview_opt_canonical_into[(1000, 10)]`	225.2 µs	188 µs	+19.8%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing ji/row-pr1-base (48f59ce) with develop (1241e14)}

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

claude added 8 commits May 17, 2026 22:00

joseph-isaacs added 2 commits May 22, 2026 12:49

t

74d89f1

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

t

48f59ce

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: vortex row crate#8056

feat: vortex row crate#8056
joseph-isaacs wants to merge 10 commits into
developfrom
ji/row-pr1-base

joseph-isaacs commented May 22, 2026

Uh oh!

codspeed-hq Bot commented May 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joseph-isaacs commented May 22, 2026

Summary

Testing

Uh oh!

codspeed-hq Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 19.8%

Performance Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codspeed-hq Bot commented May 22, 2026 •

edited

Loading