Skip to content

Conversation

ethe
Copy link
Member

@ethe ethe commented Aug 21, 2025

Phase 1 of #468

  • Typed adapters in place for compile-time records

    • #[tonbo::typed::record] is the single annotation; macro injects #[derive(typed_arrow::Record)] and encodes PK user indices under tonbo.primary_key_user_indices.
    • Legacy #[derive(tonbo_macros::Record)] path has been removed to simplify the surface (no backward-compat required for this branch).
    • Core adapters implemented: TonboTypedSchema<R, K>, TonboTypedBuilders<R>, and TonboTypedArrays<R, K>.
  • Semantics parity for tombstones and projection

    • Tombstone rows preserve primary key value(s) in PK column(s) while _null = true indicates deletion.
    • Column projection applied correctly at read and in builder finish() paths, honoring USER_COLUMN_OFFSET.
    • Parquet metadata written: tonbo.primary_key_user_indices and tonbo.sorting_columns (order: _ts desc, PK asc, nulls first).
  • Accounting and tests

    • written_size now includes _null bits, _ts values, and accumulated user payload bytes.
    • Re-enabled/adjusted tests for projection and size accounting; all tests green locally for the current workspace.
  • Safety and hygiene

    • Replaced static mut + OnceCell patterns with safe OnceCell statics; removed several clippy warnings and unused macro helpers.
    • Remaining unsafes (e.g., a transmute used to shape record refs) are earmarked for removal in a follow-up.
  • What’s next

    • Phase 3: Implement the runtime adapter on top of typed-arrow-dyn, replacing custom dynamic arrays while preserving Tonbo sentinels, PK rewrite on tombstones, sorting, and projection.
    • Composite PKs: extend adapters and macro metadata to support multi-column primary keys end-to-end (PK paths, sorting, rewrite).
    • Remove remaining unsafe in ref access paths by tightening lifetimes and using typed-arrow query helpers.
    • Add Parquet/DataFusion conformance tests for sorting and pushdown, and microbenches to validate perf parity.

@ethe ethe force-pushed the refactor/typed-arrow branch from 4e7f4ed to e0c622f Compare August 21, 2025 15:16
Copy link

codecov bot commented Aug 21, 2025

Codecov Report

❌ Patch coverage is 77.45840% with 149 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/record/typed.rs 76.11% 91 Missing ⚠️
tonbo_macros/src/typed_codegen.rs 83.22% 26 Missing ⚠️
tonbo_macros/src/data_type.rs 11.11% 24 Missing ⚠️
tonbo_macros/src/lib.rs 88.88% 8 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be removed

let full_idx = pk_full_indices[0];
let col = batch.column(full_idx).clone();
let replaced = K::rewrite_pk_column(&col, nulls, &self.tombstones, &self.keys);
cols[full_idx] = replaced;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this rewrite necessary? IMO, we don't need to care about the actual data if it is null.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants