Skip to content

fix: keep Arrow device schemas aligned with exports#8360

Merged
robert3005 merged 1 commit into
developfrom
fix/arrow-device-ffi-error-out-dict-schema
Jun 11, 2026
Merged

fix: keep Arrow device schemas aligned with exports#8360
robert3005 merged 1 commit into
developfrom
fix/arrow-device-ffi-error-out-dict-schema

Conversation

@0ax1

@0ax1 0ax1 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Route export_device_array_with_schema through the exporter so schema derivation sees the same rebuilt host layout that gets exported. This fixes host ListView fallback cases where the Arrow C schema could describe a different child layout than the emitted array.

@0ax1 0ax1 requested a review from a team June 11, 2026 13:46
@0ax1

0ax1 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@claude review

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Claude finished @0ax1's task in 2m 36s —— View job


Claude PR Review

  • Read both changed files and surrounding context
  • Verify FFI null-tolerance fix against the documented C contract
  • Verify dict-preserving host list-view rebuild and its test
  • Post review

Reviewed vortex-ffi/src/error.rs and vortex-cuda/src/arrow/canonical.rs. Both fixes are correct, well-scoped, and backed by good tests. Summary below.

Fix 1 — null-tolerant error_out (vortex-ffi/src/error.rs)

The header contract is confirmed: vortex_cuda.h:46-47 documents "when error_out is non-NULL, writes a vx_error", while the old code did assert!(!error.is_null()) in write_error and unconditionally error_out.write(...) on success — both UB/panic across the FFI boundary for the legal null case. The fix correctly guards every write:

  • write_error now early-returns on null instead of asserting. ✅
  • The new private clear_error helper centralizes the null-checked success write, used by both try_or and try_or_default. ✅
  • Removing #[allow(clippy::not_unsafe_ptr_arg_deref)] from try_or is safe: the raw-pointer deref now lives in the private clear_error/write_error helpers, and that lint only targets public functions, so it won't re-fire.

The three new unit tests cover both branches with a null error_out plus the write-then-clear sequence — exactly the previously-UB paths. Nice.

Fix 2 — dict-preserving host ListView export (vortex-cuda/src/arrow/canonical.rs)

The bug is real: export_list_view's host fallback fed the whole ListView through list_from_list_view, whose list-by-list strategy (avg size ≥ 128) canonicalizes Dict elements away, contradicting the already-emitted schema that promises an Arrow Dictionary child — silent data corruption for the consumer.

host_list_from_list_view fixes this correctly:

  • Rebuilds only over parts.codes (same length as the original elements, so the original offsets/sizes/listview_validity remain valid indices). ✅
  • Re-wraps the compacted codes with the original parts.values via DictArray::try_new. Since codes are just indices, compaction/reordering preserves the mapping. ✅
  • Outer list validity flows through codes_list.list_validity(); dict-element nulls ride along in the codes' own validity through the rebuild. ✅
  • Falls back to the generic path for non-Dict elements. ✅

This mirrors the GPU RebuiltListViewChild branch (lines 907-912) that special-cases Dict, so host and device paths now agree. The regression test deliberately uses 128-element lists (triggering the list-by-list strategy) and null codes (i % 5 == 0 → None), and asserts the exported child stays a non-null Dictionary of length 256 with 2 buffers — it would fail on the old code path.

Minor / optional

  • host_list_from_list_view discards the downcast error with Err(_) =>. That's fine here (pure type-probe), just flagging it's intentional.
  • Doc comment on export_list_view (line 795) still reads "Export a standard Vortex list as Arrow List" but the fn handles ListViewArray; pre-existing, not introduced here.

Verdict

No correctness, soundness, or leak concerns. The success-path clear_error does not free a prior *error_out value, but that matches the unchanged caller contract (uninitialized out-param), so it's not a regression. I could not run cargo clippy/nextest in this environment (no network to fetch develop; CUDA crate also needs a toolchain), so please rely on CI for the build/lint/test gates — the diff itself looks clean to me.

LGTM. ✅
· branch fix/arrow-device-ffi-error-out-dict-schema

@0ax1 0ax1 added the changelog/fix A bug fix label Jun 11, 2026
@0ax1 0ax1 marked this pull request as draft June 11, 2026 13:47
@0ax1

0ax1 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@claude fix the build / lint issues

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Claude finished @0ax1's task in 5m 41s —— View job


Build / lint status

Heads up: I'm the read-only PR review assistant here, so I can't push commits to your branch or modify files directly — I can only inspect and report. If you want the actual fix applied, see the Claude Code link at the bottom, or run the commands locally.

I also can't run cargo in this environment (sandbox blocks it, and vortex-cuda needs a GPU toolchain), so I went by CI + close code inspection.

What CI shows right now

Check Result
Rust Instrumented ✅ pass
Linux musl ✅ pass
Publish Dry Runs ✅ pass
Codspeed Benchmarking ✅ pass
Typo check / REUSE / Vortex Web ✅ pass
PR Labels (run 27351374166) ❌ fail
PR Labels (re-run 27351437961) ✅ pass
CUDA ⏳ still running
Linters and Tests (clippy/fmt) ⏳ still running

The only red check is PR Labels, which is a label-policy workflow — not a build or lint failure. A duplicate run of the same check already passed, so it looks transient/dup rather than something in your diff. That one isn't fixable by a code change; it's resolved by applying the required PR label (and isn't something this code-review bot can act on).

The two checks that actually gate build/lint for your changed crates — CUDA and Linters and Tests — are still in progress and have not reported any error yet.

By-inspection review of the diff (clean so far)

I checked the usual suspects against the workspace's strict [workspace.lints.clippy] (all/cargo = deny, plus clone_on_ref_ptr, cast_possible_truncation, redundant_clone, etc.):

  • Imports resolve — the added ExecutionCtx, IntoArray, ListArrayExt, ListViewArrayExt are all used (.list_validity(), .listview_validity(), .into_array()).
  • clone_on_ref_ptr does not fire on listview.elements().clone() / .offsets().clone() etc. — ArrayRef is a newtype (pub struct ArrayRef(Arc<…>) in vortex-array/src/array/erased.rs:83), not a bare Arc.
  • cast_possible_truncation on (i % 3) as u8 in the new test is modulo-bounded (0..=2), which current clippy does not flag.
  • All referenced test helpers existrelease_exported_array, private_data_buffer_i32_values, from_option_iter, Field::LIST_FIELD_DEFAULT_NAME, export_device_array_with_schema.
  • Removing #[allow(clippy::not_unsafe_ptr_arg_deref)] from try_or is safe — the raw-pointer writes now live in the private clear_error/pub(crate) write_error helpers, and that lint only targets exported public fns.

I could not find a build or lint problem in the changed lines. If CUDA / Linters and Tests come back green, there's nothing to fix. If one of them fails, reply with the failing log (or @claude again once it's red) and I'll pinpoint the exact line.

To run the gates yourself:

cargo +nightly fmt --all
cargo clippy --all-targets --all-features        # workspace (non-CUDA)
cargo clippy -p vortex-cuda --all-targets         # needs CUDA toolchain

Fix any lint issues →


  • Read the full diff for both changed files
  • Inspect for build/lint problems by hand
  • Check CI for concrete build/lint failures
  • Report findings
    · branch fix/arrow-device-ffi-error-out-dict-schema

@codspeed-hq

codspeed-hq Bot commented Jun 11, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 5 improved benchmarks
❌ 1 regressed benchmark
✅ 1526 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_varbinview_canonical_into[(1000, 10)] 161.8 µs 198 µs -18.25%
Simulation encode_varbin[(1000, 8)] 164.1 µs 142.9 µs +14.91%
Simulation encode_varbin[(1000, 4)] 163.1 µs 142.2 µs +14.75%
Simulation encode_varbin[(1000, 32)] 169 µs 147.7 µs +14.41%
Simulation bitwise_not_vortex_buffer_mut[128] 244.4 ns 215.3 ns +13.55%
Simulation bitwise_not_vortex_buffer_mut[1024] 304.7 ns 275.6 ns +10.58%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing fix/arrow-device-ffi-error-out-dict-schema (54dc191) with develop (eda4dd0)

Open in CodSpeed

@0ax1 0ax1 force-pushed the fix/arrow-device-ffi-error-out-dict-schema branch from 395da92 to 601bc2b Compare June 11, 2026 14:58
Route export_device_array_with_schema through the exporter so schema derivation sees the same rebuilt host layout that gets exported. This fixes host ListView fallback cases where the Arrow C schema could describe a different child layout than the emitted array.

Also tolerate null FFI error_out pointers per the C API contract and cover both regressions with tests.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@0ax1 0ax1 force-pushed the fix/arrow-device-ffi-error-out-dict-schema branch from 601bc2b to 54dc191 Compare June 11, 2026 14:59
@0ax1 0ax1 changed the title fix: null-tolerant FFI error_out and dict-preserving host list-view export fix: keep Arrow device schemas aligned with exports Jun 11, 2026
@0ax1 0ax1 marked this pull request as ready for review June 11, 2026 15:02
@0ax1 0ax1 requested review from onursatici and robert3005 June 11, 2026 15:02
@robert3005 robert3005 merged commit 8acef3a into develop Jun 11, 2026
73 of 75 checks passed
@robert3005 robert3005 deleted the fix/arrow-device-ffi-error-out-dict-schema branch June 11, 2026 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/fix A bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants