feat: vortex row crate#8056
Draft
joseph-isaacs wants to merge 10 commits into
Draft
Conversation
Add an empty `vortex-row` crate with a minimal `initialize` stub so the following commits can layer in the row-encoder, codec, scalar functions, and per-encoding kernels without touching the workspace skeleton each time. The crate is wired into the workspace members list and workspace dependency table; `public-api.lock` is generated against the stub. Signed-off-by: Claude <noreply@anthropic.com>
Introduce the per-column sort-field options and the variadic-function options struct used by the upcoming RowSize / RowEncode scalar functions. `RowEncodeOptions::fields` uses a `SmallVec<[SortField; 4]>` so typical 1-4 column keys avoid a heap allocation. Includes a compact serialize / deserialize helper used later by the scalar-function metadata round-trip. Signed-off-by: Claude <noreply@anthropic.com>
Add the byte-encoding kernels for the fixed-width portion of the row encoder: Null, Bool, Primitive (12 PTypes), and Decimal (i8..i128). Each encoder writes a 1-byte sentinel followed by the value's row-comparable bytes (sign-flipped big-endian for signed ints, sign-aware mask for floats, etc.). The size pass is a constant `width-per-row` add for these types; the encode pass walks rows and writes into the shared output buffer at `offsets[i] + cursors[i]`. `row_width_for_dtype` classifies the column based purely on its DType. Scalar-level encoders (`encode_scalar_primitive` / `encode_scalar_bool` / `encode_scalar_null` / `encode_scalar` / `encoded_size_for_scalar`) are included for the same fixed-width subset; varlen and nested canonical variants bail with a clear "not yet supported" error and land in follow-up commits. The implementation is deliberately the simplest correct version: bounds-checked array indexing, no `copy_nonoverlapping`, no validity fast-path helper. Subsequent PRs evolve this toward the optimized form. Signed-off-by: Claude <noreply@anthropic.com>
Extend the codec to handle Utf8/Binary via VarBinView arrays. Each value encodes as a 1-byte sentinel followed by 32-byte chunks: every full chunk has a 0xFF continuation marker; the final partial chunk pads with zeros and writes the partial length (1..=32) as its trailing byte. `encode_varlen_value` uses the simple byte-at-a-time XOR loop here; a faster `copy_nonoverlapping` + stamped continuation version replaces it in PR 2. `encode_varbinview` uses `arr.with_iterator(...)` for both the nullable and non-nullable branches; a direct view walk for the no-nulls branch lands in PR 2 too. `row_width_for_dtype` now returns `Variable` for Utf8/Binary; the size pass and encode dispatchers route through `add_size_varbinview` / `encode_varbinview` correspondingly. The scalar encoder gains `encode_scalar_varlen` and the matching Utf8/Binary arms. Signed-off-by: Claude <noreply@anthropic.com>
Extend the codec to handle Struct, FixedSizeList, and Extension canonical variants. Each nested row encodes as `outer_sentinel | child bytes...`; for null rows the child bytes are zero-filled after the recursive encoders run so two null rows compare equal regardless of which non-null values would have been written by the children. `row_width_for_dtype` recurses through Struct fields and FSL elements to return `Fixed(w)` when every leaf is fixed; otherwise `Variable`. Extension delegates to its storage dtype. List remains `Variable` and ListView still bails (the row encoder's output is itself a ListView, so nested ListView isn't a near-term use case). Variant and Union bail explicitly. Signed-off-by: Claude <noreply@anthropic.com>
Add the size-pass machinery used by both RowSize and the upcoming
RowEncode pipeline. `compute_sizes` walks the N input columns once,
classifying each via `row_width_for_dtype` and accumulating
fixed-width-prefix sums in `fixed_per_row` while pushing per-row sums
of variable-length columns into a lazily allocated `var_lengths` vec.
The classification result (`ColKind` + `SizePassResult`) is private to
the crate; RowEncode consumes it in a later commit to choose between
the arithmetic and cursor encode paths.
`RowSize` returns a `Struct { fixed: U32, var: U32 }` so callers can
read the per-row width without realizing the constant `fixed` slot as
a per-row buffer (it's a `ConstantArray`); the `var` slot is a
`ConstantArray(0)` when no varlen column is present.
`dispatch_size` is the fallback-only path for PR 1 (canonicalize, then
codec::field_size). The `RowSizeKernel` trait exists but is unused; per-
encoding fast paths and the inventory registry arrive in PR 3.
`initialize()` does NOT register RowSize yet - that lands once
RowEncode is in place, so the session-registered pair appears together.
Signed-off-by: Claude <noreply@anthropic.com>
Add the RowEncode variadic scalar function: encode N input columns into
a single ListView<u8> in a five-phase pipeline.
Phase 1: size pass via `compute_sizes`.
Phase 2: allocate a zero-initialized output buffer sized to fit every
row's encoded bytes; bail if the total exceeds u32::MAX.
Phase 3: build per-row `listview_offsets`: i * fixed_per_row for the
pure-fixed case, or i * fixed_per_row + exclusive cumsum of
varlen lengths otherwise. Uses the simple `Vec::push` +
`checked_add` loop.
Phase 4: walk columns left-to-right and call `dispatch_encode` for
every column (cursor path for all). Each call writes its
per-row bytes at `offsets[i] + cursors[i]` and advances the
cursor.
Phase 5: build the ListView<u8> via the validating `try_new`
constructor.
`dispatch_encode` is the canonicalize-then-`codec::field_encode`
fallback; in-crate kernel arms and the inventory registry land in PR 3.
The `RowEncodeKernel` trait is defined but unused. PR 2 will iterate
on this pipeline (skip zero-init, skip ListView validation, auto-
vectorize the offsets loop, etc.).
Signed-off-by: Claude <noreply@anthropic.com>
Wire the RowSize/RowEncode scalar functions to the user-facing API:
- `convert_columns` accepts a slice of input arrays and per-column
SortFields, constructs `RowEncodeOptions` + `VecExecutionArgs`, and
returns the encoded `ListViewArray<u8>`.
- `compute_row_sizes` returns just the per-row sizes (the `Struct
{ fixed: u32, var: u32 }` output of `RowSize`).
- `initialize()` now registers `RowSize` and `RowEncode` on the given
session so they are reachable via the expression layer.
Tests cover sort-order round-trips for bool, primitive (i64 asc/desc,
u32, f64), utf8, multi-column, nulls_first/last, struct sort-order, the
single-buffer invariant of the ListView output, and the structural
shape of `RowSize`. Tests that exercise per-encoding fast paths
(`constant_path_matches_canonical`, `dict_path_matches_canonical`) land
together with their respective kernels in PR 3.
The bench file uses divan + mimalloc and reports throughput in GB/s of
encoded output bytes for primitive_i64, utf8, and struct_mixed. Each
has an `arrow_row` baseline and a `vortex` measurement. Per-encoding
fast-path scenarios (constant/dict/patched/bitpacked/for/delta) gain
their triplets in PR 3.
Baseline measurements at this commit (sample-count=10):
primitive_i64_vortex ~1.97 GB/s (vs arrow-row 4.12 GB/s)
utf8_vortex ~0.87 GB/s (vs arrow-row 1.56 GB/s)
struct_mixed_vortex ~0.95 GB/s (vs arrow-row 1.19 GB/s)
PR 2 closes most of the gap by replacing the validating
`ListViewArray::try_new` with `new_unchecked`, skipping the buffer
zero-init, auto-vectorizing the offsets and varlen-block paths, etc.
Signed-off-by: Claude <noreply@anthropic.com>
Merging this PR will improve performance by 19.8%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
225.2 µs | 188 µs | +19.8% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ji/row-pr1-base (48f59ce) with develop (1241e14)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add an empty
vortex-rowcrate with a minimalinitializestub so thefollowing commits can layer in the row-encoder, codec, scalar functions,
and per-encoding kernels without touching the workspace skeleton each
time. The crate is wired into the workspace members list and workspace
dependency table;
public-api.lockis generated against the stub.Signed-off-by: Claude noreply@anthropic.com<!--
Thank you for submitting a pull request! We appreciate your time and effort.
Please make sure to provide enough information so that we can review your pull
request. The Summary and Testing sections below contain guidance on what to
include.
-->
Summary
Closes: #000
Testing