Skip to content

[pull] trunk from spiceai:trunk#856

Merged
pull[bot] merged 2 commits into
TheRakeshPurohit:trunkfrom
spiceai:trunk
May 22, 2026
Merged

[pull] trunk from spiceai:trunk#856
pull[bot] merged 2 commits into
TheRakeshPurohit:trunkfrom
spiceai:trunk

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented May 22, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

lukekim and others added 2 commits May 21, 2026 19:00
* perf(cayenne): reduce allocation overheads in hot paths

* refactor(cayenne): optimize SQL statement construction for insert and delete operations

* perf(cayenne): optimize hashmap usage for insert-records and improve statistics retrieval

* perf(cayenne): algorithmic wins across compaction picker, deletion writer, scan, and metastore

- compaction_picker_pick_candidates: replaced full O(N log N) sort with
  O(N) select_nth_unstable_by_key + size_hint-based bucket pre-sizing.
  Measured 2.1-3.6x speedup at 100/1000/10000 file counts.
- position_delete writer: added pre_sorted flag + new_position_based_sorted
  constructor so RoaringBitmap-derived row_ids skip the writer's redundant
  sort+dedup. UInt64Array::from_iter_values replaces row_ids.to_vec() for
  one less full O(K+N) copy per commit.
- deletion filter exec (Int64 + KeyBased): per-batch keep_mask now uses
  BooleanBufferBuilder (1 bit/row packed) instead of Vec<bool> (1 byte/row)
  + skip the BooleanArray repack pass.
- protected_snapshots field: Arc<RwLock<HashMap>> -> Arc<ArcSwap<HashMap>>.
  Scan-side reads are now wait-free Arc::clone with no HashMap clone;
  writes use rcu() for atomic CoW publish. Touches 12 sites across
  provider/table.rs + provider/delete/sink/file_based.rs.
- DeletionIndex / KeyDeletionIndex extend_max: pre-size new_keys/new_hashes
  from iterator size_hint. Measured 1.04-1.09x on small-batch CDC ingest.
- scan_file_for_key_matches (position-based deletion): cache key_indices
  on first chunk instead of re-resolving per chunk (large files paid
  index_of K times per chunk).
- metastore conversion (sqlite + turso): convert_*_value and to_*_value
  now consume MetastoreValue/TursoValue instead of borrowing, eliminating
  per-param + per-row String/Vec<u8> clone in execute/query hot paths.

All wins independently verified by cargo nextest -p cayenne --lib (273/273
passing through every iteration).

* Improve

---------

Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com>
Co-authored-by: Jeadie <jeadie@users.noreply.github.com>
Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com>
@pull pull Bot locked and limited conversation to collaborators May 22, 2026
@pull pull Bot added the ⤵️ pull label May 22, 2026
@pull pull Bot merged commit b989e53 into TheRakeshPurohit:trunk May 22, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants