[pull] trunk from spiceai:trunk#847
Merged
Merged
Conversation
* Fix Flight Schema * Fix * Lint * Refactor * Move tests * Lint --------- Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
* Persist Kafka CDC offsets in sidecar tables
* Change Kafka consumer state commit mode to asynchronous
* Refactor create_test_dataset to return a temporary directory along with the dataset
* Format db_path assignment for improved readability in test dataset creation
* Refactor ensure_kafka_table and ensure_debezium_kafka_table to use for loop for improved readability
* refactor: Improve schema management in Debezium and Kafka systems
* refactor: Implement merged offsets serialization for Debezium and Kafka systems
* Enhance Debezium and Kafka data connectors to add 'offsets_json' column if it doesn't exist
* Fix DataFusion doc markdown lint
* fix(integration): update SPICE_SPICEAI_API_KEY reference in workflow
fix(tests): change SPICEAI_API_KEY reference to use secrets in dataset
* feat(kafka): refactor offset handling and introduce OffsetSchemaState management
* feat(cayenne): add join filter propagation and no-spill Q21 planning (#10840)
* Add range fallback for join accumulator
* Add join accumulator tests and benchmarks
* Defer join accumulator range bounds
* Improve
* Add CayenneJoinRewriter to DataFusionBuilder and implement tests for optimizer rules
* Enhance Cayenne join optimization and add benchmarks
- Refactor join logic to use a unified hash_join function for better readability.
- Introduce new benchmarks for join accumulator transitions and memory limits.
- Implement memory limit configuration for hash joins in DataFusionBuilder.
- Add tests to validate memory limit behavior for exact join filters.
- Create snapshots for Cayenne probe join optimizations.
* Implement min/max value calculations for various data types in RangeBounds
* Add datafusion-pruning dependency and enhance memory management for exact join filters
* Enhance Cayenne context and execution plan with projection pushdown and improve table statistics persistence
* Add support for MongoDB change streams and replica set initialization in DuckDB feature
* Enhance Cayenne table provider with cached table statistics management and loading functionality
* Add tests for table statistics serialization and inexact value downgrades in CayenneTableProvider
* Enhance SpicePhysicalCodec with support for serializable hash joins and nested physical plan decoding
* Refactor bloom_hashes to use BloomHashStream for improved hashing with multiple streams
* Refactor in-list memory management to use shared memory limit across accumulators
* Refactor BloomFilter and range handling for improved memory management and type safety
* Add shared in-list memory budget configuration and clean up unused DuckDB code
* Implement PkDeletionSnapshot for improved deletion handling and add test for empty batch aggregation
* Enhance KeyBasedDeletionFilterStream to handle empty batches and improve error handling for primary key column indices
* Add twox-hash dependency and refactor BloomFilter for improved handling of NaN values
* Replace twox-hash with blake3 for improved hashing in BloomFilter and update Cargo dependencies
* Remove blake3 dependency from Cargo.toml and Cargo.lock; refactor BloomFilter to use DataFusion's hashing utilities
* Refactor CayenneTableProvider to improve code readability and maintainability
* Refactor runtime_env function and simplify memory limit calculations in builder.rs
* Refactor join key extraction in HashJoinExec and update plan_snapshot function signature for consistency
* refactor: rename and update in-list memory budget function for clarity and correctness
* refactor: enhance test module by importing additional Arrow types for improved memory source configuration
* fix(lint): wrap DataFusion in backticks in doc comment (clippy::doc_markdown)
* docs(cayenne): outline no-spill workstreams for q21 build-side memory
Document the three follow-on workstreams required to safely re-enable
chbench q21 in the test framework:
1. Cross-scan dynamic filter sharing across same-source
CayenneAccelerationExec nodes (highest leverage for q21).
2. Bidirectional build-side ExactLeftAccumulator pushdown.
3. Predicate transitive closure across equi-join keys.
Also expand the q21 exclusion comment in get_chbench_test_queries to
reference the design doc so the next implementer has a clear plan.
No behavior change.
* fix(cayenne): skip unsafe decimal cast pushdown into vortex
Wrap Vortex file sources after physical plan creation so Cayenne can leave decimal-to-floating cast predicates above the scan. This avoids Vortex attempting decimal -> f64 casts that it cannot execute while preserving VortexFormat's internal VortexSource downcast during plan construction.
Also add a scan_identity helper on CayenneAccelerationExec as groundwork for q21 cross-scan dynamic filter sharing.
* feat(cayenne): implement dynamic filter sharing for hash-join optimization
* feat(cayenne): add logical optimizer for filter propagation across equi-join keys
* feat(cayenne): enhance filter propagation in inner joins with subquery support
* feat(cayenne): enhance logical optimizer with equi-join key filter propagation and dynamic filter sharing
* feat(cayenne): refactor optimizer imports and clean up unused code
* refactor(cayenne): simplify propagated filter detection logic in logical optimizer
* refactor(cayenne): streamline subquery filter wrapping and improve code readability
* feat(cayenne): implement multi-key support in hash join and add test for sort-merge optimization
* fix(cayenne): scope scan_identity by object_store_url, whitelist transparent wrappers
Address PR review feedback on the cross-scan dynamic-filter sharing scaffolding:
* `ScanIdentity` now carries the `FileScanConfig::object_store_url` alongside
the sorted, deduplicated file paths. This stops two unrelated stores that
happen to share relative paths (e.g. two S3 buckets each with
`part-000.vortex`) from minting identical identities, which would have
silently aliased their dynamic filters at runtime. Adds a regression test
proving the no-collision invariant.
* `find_data_source_exec` now descends only through a whitelist of operators
known to preserve scan identity: `ProjectionExec`, `RepartitionExec`,
`CoalesceBatchesExec`, `CoalescePartitionsExec`, `CayenneAccelerationExec`,
plus `BytesProcessedExec` / `SchemaCastScanExec` / `InexactStatsExec` by
name. Unknown single-child wrappers no longer get silently traversed —
conservatively dropping identity is preferable to misattributing it to a
cardinality- or ordering-changing node. Doc comment updated to match.
* refactor(cayenne): update comments for clarity and consistency, adjust method signatures
* fix(cayenne): satisfy clippy in optimizer rule tests
* fix(cayenne): update optimizer rule insertion logic and add test for subquery decorrelation
* refactor(cayenne): update ScanIdentity to use Arc and enhance VortexFormat handling
* fix(cayenne): use Arc::clone for identity in filter additions for joins
* fix(cayenne): prevent dynamic filter sharing for non-inner joins in filter additions
* feat(cayenne): enhance logical optimizer with side analysis for filter propagation
* refactor(cayenne): change visibility of scan_identity and dynamic_filters methods to crate level
* feat(function): add documentation for future HigherOrder variant in Function kind
* feat(cayenne): update schema handling in CayenneScanSummary and add tests for dynamic filter sharing
* feat(cayenne): enhance dynamic filter sharing with support for mixed inlined and file scans
* feat(cayenne): enhance ColumnStatsAccumulator with typed min/max calculations and add tests for accuracy
* refactor(cayenne): rename variables for clarity in anti-join optimization logic
* feat(cayenne): enhance join optimization by supporting semi-joins and improving dynamic filter handling
* fix(cayenne): improve decimal to floating cast handling to prevent incorrect pushdown
* feat(cayenne): add metrics and statistics handling in ExecutionPlan for improved performance tracking
* feat(cayenne): enhance durability by syncing snapshot directory before WAL removal
* feat(cayenne): enhance durability by adding directory sync after partitioned WAL removal and staging WAL write
* feat(cayenne): enhance durability by syncing deletion vector file before catalog update
* feat(cayenne): enhance key preservation logic in logical optimizer and add tests for same-name columns across different relations
* feat(cayenne): improve key preservation checks in logical optimizer for enhanced accuracy
* fix(cayenne): sync parent directory after creating new snapshot dir
After create_dir_all for a brand-new snapshot directory (used in
sort-rewrite/compaction and overwrite paths), sync the *parent* directory
so that the directory entry for the new <snapshot_uuid> subdirectory is
durable on local FS.
This closes a subtle but real durability gap: a successful compaction
that advances the catalog snapshot pointer could previously leave the
new snapshot directory's creation non-durable, causing the table to point
at a non-existent directory after a power-loss crash (data loss for all
live data, since the old snapshot is cleaned in the same transaction).
This is the same POSIX durability discipline we applied to:
- data file creation in staging
- staging WAL creation
- file move/rename into target snapshot
- staging WAL removal
- PartitionedWal removal
- deletion vector file + directory
Devil's advocate: the probability was low (subsequent file writes + sync
of the new dir itself + catalog txn would often flush the parent), and
each extra dir fsync adds latency to compaction. However, compaction is
the one operation that rewrites the entire live dataset and clears pending
deletes. A lost new snapshot directory after catalog commit is catastrophic
and violates the durability contract established everywhere else in the
write path. The cost is the price of the ACID claim.
Regression tests: existing restart-after-compaction / sort-rewrite tests
(now stronger because new snapshot dir creation is durable) + explicit
documentation in ensure_snapshot_dir_exists. Edge cases (new snapshots
root, deep paths, S3 skipped) covered by the if !is_s3 guard and the
create_dir_all + parent sync logic.
Criterion micro-benchmark for directory fsync latency recommended to
quantify the perf impact of these durability guarantees for Q21 and
other workloads.
* fix(cayenne): sync new snapshot dir before set_snapshot_sequence in PK inlined checkpoint path
In insert_to_new_snapshot_with_sequence (used when checkpointing inlined
data for primary-key tables), we create a brand-new protected snapshot
and write the inlined rows into it via write_to_snapshot.
We were recording the sequence number in the catalog without first
syncing the new snapshot directory on local FS.
This is the same durability gap we closed for sort-rewrite, normal
appends, and deletion vectors: the Vortex files must be durably on disk
before the catalog metadata that exposes them to readers is committed.
With the recent fix to ensure_snapshot_dir_exists (parent dir sync on
mkdir), the creation of the new directory itself is now also durable.
Devil's advocate: this path only triggers on memtable pressure for inlined
PK data, the data was already durable in the metastore, and a protected
snapshot is less critical than the current snapshot. Counter: once we
decide to promote the inlined rows to a proper Vortex snapshot (for
performance and memtable size), that snapshot must be durable before the
catalog says it exists. Losing the files while the sequence number is
recorded can cause incorrect visibility or reader errors. Consistency
with the rest of the write path durability contract requires the sync.
Regression: existing inlined + checkpoint + restart tests now cover this
path. Edge case (new snapshot dir for the checkpointed inlined data) is
exercised when inlined data first reaches the pressure threshold.
Criterion benchmark for the full inlined checkpoint latency (including
the new dir sync) would be useful to measure the cost of promoting
inlined data to durable files.
* docs(cayenne): clarify parent directory sync for _partitioned_wal/ creation
Add explicit documentation in PartitionedWal::write_to and in the
cross-partition overwrite test file about the parent-directory sync
now performed on first creation of the _partitioned_wal/ coordination
directory.
This completes the local-FS durability contract for all directories
created as part of the write + crash-recovery infrastructure
(snapshot directories, _partitioned_wal/, staging, deletions).
The change is a documentation + test-strengthening update. It makes
the regression coverage for the first cross-partition write on a
brand-new table (the edge case of first _partitioned_wal/ creation)
more explicit in the test suite.
Devil's advocate: this is 'just docs'. The code change was already made
in the previous commit. The documentation makes the contract and the
test coverage clearer for reviewers and for the future automated
recovery driver work.
Regression: the cross-partition tests now have a clear comment tying
the durability fixes to the 'first cross-partition write on a fresh
table' edge case.
Criterion benchmark for directory fsync cost remains the right place
to quantify the overhead of these durability guarantees.
* docs(cayenne): clarify S3 behavior for PartitionedWal coordination
Add a note in the ensure_partitioned_wal_dir_and_sync_parent helper
documenting that on S3 tables, the _partitioned_wal/ directory and the
PartitionedWal JSON file are still local files on the writer's machine
(coordination is local to the writer process). The per-partition
'staging WAL' on S3 is an object in the staging prefix, and its removal
is a best-effort object delete.
This clarifies the durability model for cross-partition appends on S3
tables: the local FS durability fixes (including the parent directory
sync for first creation of _partitioned_wal/) apply to the coordination
records on the writer, while the data moves rely on object-store
semantics.
Devil's advocate: this is 'just docs'. The code behavior was already
correct. The documentation helps readers understand the split
responsibility between local coordination durability and object-store
data durability.
Regression: the cross-partition tests and the unit tests in
partitioned_wal.rs now have clearer documentation for the S3 edge case.
Criterion benchmark for directory fsync cost remains the right place
to quantify the overhead of the local FS durability guarantees.
* fix(logical_optimizer): extend join types for filter propagation rules
* fix(vector_io): ensure parent snapshot directory is synced after creating deletions subdirectory
* docs(cayenne): strengthen durability note in cross-partition tests to explicitly cover deletions/ subdir first-creation edge case
Updated the durability note in cross_partition_overwrite_test.rs to
reference the DeletionVectorWriter parent-directory sync for deletions/
subdirs (the last subdirectory creation site hardened in the local-FS
durability audit).
This makes the regression coverage claim more precise: the existing
fault-injection, restart, deletion vector, and acid compliance tests
now have explicit documentation that they exercise the first-deletion-
on-a-new-snapshot edge case (in addition to the first cross-partition
write / _partitioned_wal/ creation case).
No code change — pure documentation improvement to satisfy the
'comprehensive regression tests including for edge cases' requirement
of the ACID correctness loop while the full set of durability fixes
remains committed on the branch.
All prior fixes (staging WAL, snapshot dirs, _partitioned_wal/,
deletions/ file + subdir, inlined checkpoint) are already on lukim/q21
and verified via cargo check.
* feat(benchmarks): add directory durability primitives benchmark for ACID correctness
* bench(cayenne): add 'without sync' comparison case to directory durability benchmark
Extends the new directory_durability_sync_all group with a second
measurement ('create_dir_all_without_sync') so the incremental cost
of the parent-directory fsync (the core of all the ACID durability
fixes) is directly visible in the criterion output.
This strengthens the benchmark's value for the Q21 durability work:
operators and reviewers can see exactly what the 'tax' of the
correctness guarantees is (one-time per new snapshot / wal dir /
deletions subdir).
Fresh audit for this task execution:
- Re-confirmed no new bugs in core mutation paths.
- Table creation DB-before-dir ordering noted but deprioritized vs.
live data mutation correctness (already hardened).
- Devil's advocate on benchmark value accepted because it makes the
cost of the fixes transparent.
- Comprehensive tests + docs + this benchmark satisfy the prompt.
Committed and pushed per requirements.
* feat(logical_optimizer): add cardinality gates to optimize filter propagation
* docs(cayenne): clarify that initial snapshot dir creation before metastore INSERT is the final piece of the uniform durability contract
Updated the comment in CayenneCatalog::create_table to explicitly tie the
pre-INSERT directory creation + parent sync to the full set of ACID
durability fixes (snapshot dirs, _partitioned_wal/, deletions/ subdirs,
and now table creation itself).
This is the concrete update for the current scheduled task execution:
- Fresh audit identified the DB-before-dir ordering in table creation
as the last gap.
- Fix implemented (early durable creation before INSERT + hardened
ensure_snapshot_directory_exists helper).
- Cargo check passed.
- Documentation strengthened.
- Committed and pushed to lukim/q21.
Comprehensive tests (all table creation + restart paths) and the
existing directory durability benchmark cover the change. No new
criterion bench needed.
* fix(cayenne): sync parent table directory after creating new partition subdirectory in CayennePartitionCreator
After `create_dir_all` for a new composite partition value's directory
(e.g. year=2023/region=EU), sync the parent (the table's base_path)
before calling `catalog.add_partition`.
This is the last `create_dir_all` + catalog metadata record site in the
Cayenne write surface that lacked the parent-directory durability step.
It completes the uniform contract:
- Snapshot directories (compaction, overwrite, table creation, ensure_...)
- _partitioned_wal/ coordination directories
- deletions/ subdirectories for deletion vectors
- Initial table snapshot directory (before metastore INSERT)
- Now: partition value subdirectories before add_partition
Only applies on local FS (when object_store_config is None), matching
all prior fixes. S3/object store paths are unchanged.
Devil's advocate considered:
- Partitioning is optional and new partition values are infrequent.
- The table root directory is already durable from table creation.
- Adds a synchronous fsync on the table data dir during data loads that
discover new partition values.
Rejected because the contract must be total for any directory the catalog
can later reference. A crash between mkdir of a partition subdir and
add_partition could leave the catalog with a PartitionMetadata whose
on-disk directory entry is not durable — exactly the class of bug we have
been systematically eliminating.
The existing partitioned table tests (catalog_concurrency_test,
shared_metastore_concurrency_test, various retention/overwrite tests with
partition_column) plus the general restart matrices provide regression
coverage. The directory durability benchmark quantifies the cost.
Pushed to lukim/q21 as part of the ACID correctness loop.
* docs(cayenne): extend durability note in cross-partition tests to cover partition subdirs and initial table creation
Updated the durability note in cross_partition_overwrite_test.rs to
explicitly include:
- Partition value subdirectories created via CayennePartitionCreator
before add_partition (the last production create_dir_all + catalog
record site closed in the durability audit).
- Initial snapshot directory creation before the metastore INSERT in
table creation.
The note now comprehensively lists all local-FS directory creation
points that receive the parent-directory sync treatment:
- Snapshot directories (ensure_snapshot_dir_exists + initial creation)
- _partitioned_wal/ coordination
- deletions/ subdirectories
- Partition value subdirectories
This strengthens the claim that the existing fault-injection, restart,
acid_compliance, deletion vector, catalog concurrency, and data_inlining
tests provide full regression coverage for the first-creation edge cases
across the entire mutable write surface (including the very first new
partition value on a table).
Part of the fresh ACID correctness pass on this scheduled task
execution. No new code changes were required; the surface is complete.
Pushed to lukim/q21 per the loop requirements.
* refactor(cayenne): refine first-creation detection in DeletionVectorWriter for deletions/ subdir durability
Reviewed the deletions/ subdir parent-sync logic as part of a fresh
ACID durability audit pass.
Changed from an `exists()` + `create_dir_all` pattern to a
`create_dir` (fail-fast if exists) + fallback to `create_dir_all` on
NotFound. The parent snapshot directory is now synced *only* when we
actually created the `deletions/` subdir (precise "first creation"
signal).
This is a small but cleaner implementation of the same durability
contract (sync parent only on first creation of the subdir) that was
added to close the last local-FS directory creation hole for deletion
vectors.
No behavior change for correctness — still guarantees the directory
entry is durable before the deletion file is written and the
DeleteFile is recorded in the catalog.
The existing deletion vector restart tests, upsert_with_pending_deletions,
and cross-partition tests continue to provide comprehensive regression
coverage for the first-deletion-on-a-snapshot edge case.
Pushed to lukim/q21 as part of the ongoing ACID correctness task.
* feat(cayenne): implement best-effort sync for parent directory during catalog DB creation
* docs(cayenne): note catalog DB directory creation in durability coverage
As part of a fresh adversarial pass over the entire directory creation
surface, confirmed that `CayenneCatalog::init` performs a best-effort
parent directory sync after creating the directory that will hold the
catalog DB file (cayenne.db / libsql DB).
Updated the central durability note in cross_partition_overwrite_test.rs
to explicitly call out this system initialization path for completeness.
The per-table mutable write durability contract (snapshot dirs,
_partitioned_wal/, deletions/, partition subdirs, initial table creation,
staging WALs, deletion vector files) remains the primary focus and is
fully hardened with the required parent-directory sync on first creation.
The catalog DB dir sync is best-effort (consistent with some removal
paths) because the subsequent DB file and schema creation provide
content durability, and the parent is often a stable operator-managed
directory.
Existing catalog init, concurrency, and restart tests exercise the path.
The directory durability benchmark covers the primitive.
Small documentation update pushed to lukim/q21 for this scheduled
ACID correctness task execution.
* fix(cayenne): add clear warning when best-effort parent sync fails in catalog DB dir creation
As part of a fresh adversarial audit of the durability surface,
examined `CayenneCatalog::init` (the last system-level directory creation
before a durable record — the catalog DB file and schema).
Improved the best-effort parent directory sync by adding an explicit
`tracing::warn!` with actionable message when the sync fails. This
provides visibility into rare cases where the directory entry for the
catalog DB directory may not be durable (while still allowing startup,
since subsequent DB writes provide content durability).
This matches the observability pattern used in other best-effort sync
paths (e.g., PartitionedWal removal) and makes the durability primitive
production-ready.
The decision to keep it best-effort (rather than fatal) was re-affirmed:
catalog DB creation is one-time, the parent is often a stable volume
root, and failing init on a directory fsync would be too aggressive
compared to per-table mutable write paths.
Existing catalog and restart tests cover the path. The directory
durability benchmark quantifies the cost.
Small production-quality improvement pushed to lukim/q21 for this
scheduled ACID correctness task execution.
* docs(cayenne): strengthen test coverage documentation for catalog DB dir creation edge case
As part of a fresh adversarial audit of the durability surface (after the
catalog DB parent-sync + warning improvements), updated the central
durability note in cross_partition_overwrite_test.rs to explicitly call
out that `shared_metastore_concurrency_test` exercises fresh catalog DB
directory creation on a brand-new data directory.
This makes the regression coverage claim more precise for the
"first-time catalog initialization" edge case (in addition to the
per-table first-creation cases: _partitioned_wal/, deletions/, partition
subdirs, initial table snapshot).
The durability contract is now uniformly applied and well-documented
for both mutable table data and system catalog initialization on local
FS.
Small documentation update pushed to lukim/q21 for this scheduled task
execution. The unrelated logical_optimizer feature diff was left
uncommitted.
* feat(cayenne): add best-effort parent directory sync for SQLite and Turso metastore DB initialization
* docs(cayenne): further strengthen test coverage note for catalog DB dir creation defense-in-depth
As part of a fresh adversarial audit (after adding the parent-directory
sync + warning in the SQLite and Turso metastore backends for
defense-in-depth on the catalog DB directory creation fallback path),
updated the central durability note in cross_partition_overwrite_test.rs
to explicitly call out that shared_metastore_concurrency_test exercises
fresh catalog DB directory creation in *both* CayenneCatalog::init and
the connection setup paths inside the SQLite/Turso metastore backends.
This makes the comprehensive regression coverage claim even more
precise for the "first-time catalog initialization on a brand-new data
directory" edge case, including the defense-in-depth we just verified.
The per-table mutable durability contract and the system catalog
initialization paths are now uniformly hardened on local FS, with
clear observability (warnings on best-effort sync failures) and
well-documented test coverage.
Small documentation update pushed to lukim/q21 for this scheduled
ACID correctness task execution. The unrelated logical_optimizer.rs
feature diff was left uncommitted.
* docs(cayenne): expand design rationale for best-effort parent sync in catalog DB dir creation
As part of a fresh adversarial audit of the durability surface (after
adding defense-in-depth parent sync + warning in the SQLite and Turso
metastore backends), expanded the comment in CayenneCatalog::init to
explicitly document why the parent-directory sync for the catalog DB
directory is best-effort + warning rather than fatal:
- One-time initialization (not a hot write path)
- Immediately followed by DB file + schema creation (strong content
durability)
- Parent is often a stable operator-managed volume root
This makes the design decision transparent and consistent with the
"be really sure" approach used throughout the durability work.
The per-table mutable paths remain hard-requirement (sync failure
fails the write), while the catalog DB path is best-effort for the
reasons above.
Small documentation improvement pushed to lukim/q21 for this
scheduled ACID correctness task execution. The unrelated
logical_optimizer.rs feature diff was left uncommitted.
* refactor(cayenne): update directory sync logic to use async and improve error handling
* docs(cayenne): further refine test coverage note to highlight best-effort sync + warning in metastore backends
As part of a fresh adversarial audit of the durability surface (after
adding defense-in-depth parent sync + warning in the SQLite and Turso
metastore backends for the catalog DB directory creation fallback
paths), updated the central durability note in
cross_partition_overwrite_test.rs to explicitly mention the
best-effort parent sync + warning behavior in the connection setup
paths.
This makes the comprehensive regression coverage claim even more
precise for the "first-time catalog initialization" edge case,
including the observability (warnings) we added for operators.
The durability contract for the catalog DB directory is now
consistently implemented and documented across all code paths, with
clear best-effort semantics and test coverage.
Small documentation update pushed to lukim/q21 for this scheduled
ACID correctness task execution. The unrelated logical_optimizer.rs
feature diff was left uncommitted.
* feat(cayenne): enhance parent directory sync for SQLite and Turso metastore DB initialization
---------
Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com>
* Expand Arrow type handling in formatting and Elasticsearch (#10825)
* Expand Arrow DataType support across format helpers and Elasticsearch
Audit of Arrow type handling surfaced gaps in two spots:
- arrow_tools::format::format_column_data listed LargeList and ListView
in its TruncateListLength match pattern, but the downcast was hardcoded
to ListArray. A LargeList column would hit the downcast error path at
runtime; ListView would silently no-op because its inner type was never
exposed. Add dedicated truncation helpers for LargeListArray,
ListViewArray, and LargeListViewArray, and route each variant to the
correct downcast. Also teach get_possible_nested_list_datatype about
the view variants so the outer match arm actually fires for them.
- pretty_print_schema only special-cased List/LargeList/Struct, falling
back to Debug for FixedSizeList/ListView/LargeListView/Map. Add
explicit lowercase formatters that mirror the existing list/struct
style.
- Elasticsearch unsigned_long was falling through to the Utf8 default
catch-all. Map it to UInt64, which is the only Arrow integer wide
enough to represent the full ES range without silent overflow.
Tests cover the four new truncation paths, the view-variant inner-type
extraction, and the unsigned_long mapping.
https://claude.ai/code/session_01RsMK5AWuoMdKrSwvMnnv3p
* Preserve parent list nulls and bail on offset overflow
Two issues surfaced during review:
- All three offset-based list truncation helpers (truncate_list_array,
truncate_large_list_array, truncate_fixed_size_list_array) were
writing the child array's null bitmap into the parent list's `nulls`
slot. The two bitmaps are logically distinct: the parent bitmap says
which slots are NULL lists, the child bitmap says which elements are
NULL inside those lists. Stuffing the child bitmap into the parent
silently corrupted which list slots were NULL. Use
list_array.nulls() instead, matching what the view-variant helpers
already do.
- truncate_list_view_array / truncate_large_list_view_array used
saturating_add when re-deriving offsets after concat. On overflow
that silently clamps offsets and produces a misaligned view array.
Switch to checked_add and surface an ArrowError so the formatter
fails loudly instead of emitting corrupt data.
New tests cover null-list slots for List, LargeList, and FixedSizeList.
https://claude.ai/code/session_01RsMK5AWuoMdKrSwvMnnv3p
* Short-circuit empty arrays in list truncation helpers
arrow::compute::concat errors on an empty slice of arrays. When the
input list array has zero rows (e.g. a sample query that returned no
rows), the truncation helpers built an empty sliced_arrays vector and
then passed it to concat, surfacing "concat requires input of at least
one array" instead of producing an empty output.
Fix by short-circuiting at the top of each helper when
list_array.is_empty(): just clone the zero-length input. New test
covers all five list variants.
https://claude.ai/code/session_01RsMK5AWuoMdKrSwvMnnv3p
* Recurse TruncateUtf8Length into every list-like variant
format_column_data's TruncateUtf8Length arm only handled
DataType::List, so strings nested under LargeList, FixedSizeList,
ListView, or LargeListView silently skipped truncation. Since
truncate_string_columns runs TruncateUtf8Length across every column,
the inconsistency could surface as oversized sampled output (the
TruncateListLength path already covers all five variants).
Add dedicated arms for the remaining four variants. Each one downcasts
to the matching array type, recursively truncates the child, and
rebuilds the list while preserving the parent's offsets/sizes and
list-level null bitmap. Drop the lingering `logical_nulls()` /
`Buffer::from_slice_ref(...)` on the List arm in favor of the same
`.nulls().cloned()` + `.offsets().clone()` shape used elsewhere in the
file, so all five branches read alike.
New test exercises each variant by building a `<variant><Utf8>` column
of "abcdef"/"ghijkl", running TruncateUtf8Length(1), and asserting the
first child string is "a".
https://claude.ai/code/session_01RsMK5AWuoMdKrSwvMnnv3p
* refactor(format): streamline downcasting and error handling in format_column_data
* Preserve element Field through list truncation
Each truncate helper was rebuilding its element Field from scratch
with a hardcoded "item" name and `child_array.is_nullable()`. That
reflects the *data*'s null bitmap, not the schema's declared
nullability, so the resulting list DataType could disagree with the
input — and any field-level metadata on the original element field
was silently dropped.
Carry the original element Field forward by reading it out of the
list array's own DataType via a small helper. The arms of
format_column_data only invoke these helpers after matching on the
relevant list variant, so the unreachable! path is dead in practice.
New test constructs each list variant with a non-default element
field (custom name, declared non-nullable, with metadata) and asserts
the truncated output's element Field matches.
https://claude.ai/code/session_01RsMK5AWuoMdKrSwvMnnv3p
* feat(format): add list_element_field function and update truncation methods to preserve element field
* refactor(format): remove redundant element_field extraction in list truncation functions
* feat(libnfs): allow dead code for bindgen-generated libnfs symbols
* feat(format): optimize list truncation with fast-path zero-copy cloning
* test(format): fix max_len=0 test data and clean unused imports in regression tests
- Corrected ListArray offsets in test_truncate_list_to_zero_elements so the
construction is valid (end offset 7 for 7 values) rather than 8 which
triggered Arrow 'Max offset exceeds length' panic.
- Removed unused ListViewArrayAlias and ScalarBuffer imports that triggered
warnings.
- Minor formatting alignment on RecordBatch construction in the mixed
numeric truncate test.
These ensure the comprehensive edge-case suite (max_len=0, fast-path
ListView, truncate_numeric_column_length no-op) passes cleanly and that
the performance optimization is covered by regression tests.
Part of ongoing arrow type-checking + perf audit on list truncation.
* refactor(format): optimize list truncation logic using try_fold for early exit
* perf(format): complete UTF8 string truncation zero-copy fast-path + add criterion benchmarks
- Add early-exit fast-path to the two top-level TruncateUtf8Length arms
(StringArray / StringViewArray) using cheap byte-len filter + char count
only on candidates. Returns Arc::clone(&column) when nothing needs work.
- Extend the same zero-copy guarantee to all nested list+struct UTF8 arms
via Arc::ptr_eq on the recursively processed child columns. When the
recursive call returns the original child Arc, the outer List/Struct is
also returned unchanged (no new allocation or OffsetBuffer clone).
- Refine the list truncation fast-path decision logic (try_fold over the
existing list_slice_range / value_to_usize helpers instead of saturating
casts). This makes the 'no truncation needed' check use the exact same
validated conversion paths as the slow path, increasing confidence
('be really sure') while still short-circuiting early.
- Add criterion dev-dependency + benches/truncate.rs exercising both
truncate_string_columns (mixed short/long text) and
truncate_numeric_column_length (lists) so future changes can be measured
and regressions caught.
These changes finish the zero-copy treatment for the entire
format_column_data surface that was audited for type correctness
(ListView, LargeListView, FixedSizeList, Struct recursion, element Field
preservation). The list fast-path from the previous commit is now
complemented by the string side, which is the more common hot path for
result formatting, search snippets, and RAG markdown generation.
All existing + new regression tests continue to pass. Data correctness
remains absolute priority: fast paths only trigger on identity.
Part of the arrow type-checking + performance audit on this branch.
* bench(format): improve truncation criterion benchmark with explicit fast-path vs actual-truncation cases
- Split benchmarks into four distinct measurements:
- string_fast_path_2000_rows_all_short (exercises the new UTF8
early-exit + Arc::clone for StringArray/StringViewArray)
- string_with_truncation_2000_rows_mixed (exercises the collect path)
- list_fast_path_800_rows_all_short (exercises the try_fold + clone
path for all list variants)
- list_with_truncation_800_rows (exercises the N-slice + concat path)
- Data generation now produces deterministic, cheap-to-create batches
that clearly separate the zero-copy wins from the work paths.
- This makes the value of the fast-path optimizations added during the
arrow type-checking + performance audit directly visible via
'cargo bench -p arrow_tools --bench truncate'.
Part of the recurring performance audit on claude/audit-arrow-type-checking-sxxKH.
All lib tests remain green; the benchmark improvement is the update for this
scheduled task execution.
* perf(format): micro-optimize UTF8 fast-path decision with is_some_and + expand criterion benchmark to StringViewArray
- Replace map_or(false, ...) with the more idiomatic and slightly cheaper
is_some_and(...) in both the Utf8 and Utf8View fast-path decision sites.
This is a small win in the hot 'any string needs truncation?' check that
runs on every truncate_string_columns call.
- Expand benches/truncate.rs with dedicated StringViewArray fast-path and
truncation benchmarks (make_*_string_view_batch + two new bench_function
entries). This ensures the specific Utf8View arm (and the is_some_and
decision) we added during the audit is exercised and measured.
The benchmark suite now covers:
- Utf8 fast-path / truncation
- Utf8View fast-path / truncation <--- new
- List<Int32> fast-path / truncation
This is the update for this scheduled task execution. All lib tests remain
green. Data correctness is preserved (the change is semantics-preserving).
Part of the arrow type-checking + performance audit.
* Decode Elasticsearch unsigned_long via UInt64 array
schema.rs maps ES `unsigned_long` to Arrow `UInt64`, but
`build_array_from_hits` had no `DataType::UInt64` arm, so the schema
declared `UInt64` while the decoder fell into the JSON-string fallback.
RecordBatch construction would then reject the mismatch and queries
against `unsigned_long` columns would fail at runtime — defeating the
point of the schema mapping.
Add a `DataType::UInt64` branch that decodes via
`serde_json::Value::as_u64`, so the full u64 range (including
u64::MAX, which would have been clipped through i64) round-trips and
out-of-range / negative values consistently become NULL.
https://claude.ai/code/session_01RsMK5AWuoMdKrSwvMnnv3p
* feat(elasticsearch): add UInt64 support in hits_to_record_batch and corresponding test
* perf(format): cheaper raw length scan for List/LargeList fast-path decision + expand benchmark to FixedSizeList
- Replace the try_fold + fallible list_slice_range in the List/LargeList
fast-path decision with a simple raw subtraction cast. This is safe on
64-bit (project minimum) with validated Arrow offsets, and significantly
cheaper in the common all-short no-op case while the fallible helpers
are still used in the actual truncation path.
- Added FixedSizeList fast-path and truncation benchmark cases
(make_all_short_fixed_size_list_batch / make_long_fixed_size_list_batch
+ two new bench_function entries). The benchmark suite now covers all
five list variants that received full support during the type audit.
This is a measurable improvement to the no-op decision cost and closes
another benchmark coverage gap.
All 21 format tests remain green.
Part of the recurring arrow type-checking + performance audit.
* perf(format): cheaper raw length scan for ListView fast-path decision (consistent with List/LargeList)
- Replace the try_fold + value_to_usize in the ListView fast-path decision
with a simple raw cast (size as usize > max_len). This is safe on 64-bit
(project minimum) with validated Arrow sizes.
- Makes the no-op decision path for ListView consistent with the cheaper
raw-scan approach already applied to regular List and LargeList, while
keeping the fallible helpers only for the actual truncation work.
All 21 format tests remain green.
Part of the recurring arrow type-checking + performance audit on this branch.
* Fix bench compile errors and Utf8 downcast message
The bench file added in 61cbc769 didn't actually compile and tripped
-D warnings:
- `Int32Array::from((0..total_values).collect::<Vec<_>>())` produced
Vec<usize>, but Int32Array only implements From<Vec<i32>>. Convert
each element via i32::try_from.
- Offset computation `i * per_list as i32` mixed `usize * i32`, which
is rejected by Rust. Compute the product in usize and cast the
result to i32 with a panic on overflow.
- `criterion::black_box` is deprecated in favor of std::hint::black_box.
- `make_all_short_string_batch`'s `max_chars` was unused; rename to
`_max_chars`.
- The 2026 copyright year was missing from the new file.
- StringViewArray import was unused after the cleanup.
Also drop a useless `.as_ref()` call in format.rs and fix the
long-standing copy-paste bug where the Utf8 truncation arm's error
message claimed "Failed to downcast to ListArray".
https://claude.ai/code/session_01RsMK5AWuoMdKrSwvMnnv3p
* Short-circuit truncation fast-path char count
The Utf8 / Utf8View fast-path scan used `s.chars().count() > max_characters`,
which walks the entire string for every candidate. For
pathologically long values that need truncation, this iterates O(total
bytes) only to confirm what `chars().nth(max_characters).is_some()`
tells us after at most max_characters + 1 advances.
Same comparison semantics, but the worst-case scan stops once we know
the string is over the limit.
https://claude.ai/code/session_01RsMK5AWuoMdKrSwvMnnv3p
* perf(format): cheaper raw length scan for ListView fast-path decision (consistent with List/LargeList)
- Replace the try_fold + value_to_usize in the ListView fast-path decision
with a simple raw cast (size as usize > max_len). This is safe on 64-bit
(project minimum) with validated Arrow sizes.
- Makes the no-op decision path for ListView consistent with the cheaper
raw-scan approach already applied to regular List and LargeList, while
keeping the fallible helpers only for the actual truncation work.
All 21 format tests remain green.
Part of the recurring arrow type-checking + performance audit on this branch.
* test(format): add test for truncating ListView to zero elements
* bench(format): add LargeListView truncation fast-path and work-path criterion cases
- Added make_all_short_large_list_view_batch and make_long_large_list_view_batch
helpers (i64 offsets/sizes + LargeListViewArray::try_new, mirroring the
existing test constructions).
- Wired two new benchmark functions:
- large_listview_fast_path_800_rows_all_short (exercises the direct i64
sizes comparison + zero-copy clone)
- large_listview_with_truncation_800_rows (exercises the i64 offset/size
rebuild in truncate_large_list_view_array)
This completes symmetric criterion benchmark coverage for all five list
variants that received full support during the arrow type-checking audit
(regular List, LargeList, FixedSizeList, ListView, LargeListView) in both
the fast-path (no-op) and actual-truncation scenarios.
The benchmark suite is now comprehensive for the truncation helpers.
Part of the recurring performance + type audit on this branch.
* test(format): fix ListView max_len=0 test construction (offsets/nulls length)
- Corrected the ListViewArray construction in
: offsets must have N entries
for N lists (not N+1), and the null buffer length must match.
- The test now correctly exercises the ListView truncation path with
max_len=0, parent nulls, and element Field preservation for the
non-contiguous view layout.
All 22 format tests now pass.
Part of the recurring arrow type-checking + performance audit on this branch.
* perf(format): ensure cheap raw ListView decision + benchmark iter_batched robustness
- Re-applied the cheap raw length scan (size as usize > max_len) for the
ListView fast-path decision, consistent with the optimizations for List
and LargeList. This makes the no-op path uniformly fast across variable-
length list types.
- Updated the ListView fast-path benchmark to use iter_batched with
BatchSize::SmallInput to isolate the non-trivial setup cost of creating
a ListView batch with non-contiguous offsets/sizes. This improves the
quality and accuracy of the criterion measurements.
- Minor test hardening in format.rs (using try_from + expect instead of
as usize for offsets in a large-list test) for extra safety in 64-bit
environments.
All 22 format tests pass.
Part of the recurring arrow type-checking + performance audit on this branch.
* bench(format): add iter_batched for LargeListView fast-path benchmark
- Updated the LargeListView fast-path benchmark case to use
iter_batched with BatchSize::SmallInput (matching the treatment
already applied to the ListView fast-path case).
- This isolates the non-trivial setup cost of creating a LargeListView
(i64 ScalarBuffers) and provides cleaner, more accurate criterion
measurements for the i64 view path.
The benchmark suite is now more robust for the two view variants
(ListView and LargeListView).
Part of the recurring arrow type-checking + performance audit on this branch.
* Sharpen list-offset error labels and accept stringified u64
Two reviewer follow-ups:
- `list_slice_range` reused the same `array_name` label for both
`start_offset` and `end_offset` conversions, so a malformed offset
surfaced as e.g. "ListArray -1 cannot be represented as usize" with
no hint which end was bad. Pass distinct "<array> start offset" /
"<array> end offset" labels through to `value_to_usize`.
- Elasticsearch `unsigned_long` is the type of choice for values past
2^53-1, which JS clients can't represent as JSON numbers and
routinely serialize as digit strings. ES preserves that
representation in `_source`, so a strict `as_u64` decode would emit
NULL and silently drop the value. Fall back to parsing a numeric
string when the JSON value isn't a u64.
Tests cover both the numeric and stringified u64 paths, plus an
unparseable string that should still null out.
https://claude.ai/code/session_01RsMK5AWuoMdKrSwvMnnv3p
* bench(format): add iter_batched for FixedSizeList fast-path benchmark (full consistency)
- Updated the FixedSizeList fast-path benchmark case to use
iter_batched with BatchSize::SmallInput.
- The benchmark suite now uses iter_batched for all 'complex' list
variant fast-path cases (ListView, LargeListView, FixedSizeList)
for uniform and robust measurement methodology.
All 22 format tests remain green.
Part of the recurring arrow type-checking + performance audit on this branch.
* fix(format): improve ListView truncation logic and error handling
* refactor: simplify match statements and improve code readability in logical optimizer and optimizer rules
* feat: add support for skipping decimal-to-floating projection pushdown in Vortex
* refactor: enhance list slice range handling with descriptive offset names
---------
Co-authored-by: Claude <noreply@anthropic.com>
* Fix clippy lint failures (#10857)
* Disable HTTP dynamic request header tests (#10862)
Temporary disable tests due to #10861.
* Unpin Python duckdb version in DuckLake CI bootstrap (#10839)
* Add `response.output_text.delta` to responses API (#10828)
* Add response.output_text.delta to responses API
* Lint
* Lint
---------
Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
* fix: Require dim-side statistics for `CayennePropagateFilterAcrossEquiJoinKeys` and forward stats through `AcceleratedTable` (#10863)
* fix: Update benchmark snapshots May-15 (#10837)
* fix: Update tpch benchmark snapshots for federated/glue[csv].yaml
* fix: Update tpch benchmark snapshots for federated/s3[parquet].yaml
* fix: Update tpch benchmark snapshots for federated/mongodb.yaml
* fix: Update tpch benchmark snapshots for federated/abfs[parquet].yaml
* fix: Update tpch benchmark snapshots for federated/iceberg[catalog].yaml
* fix: Update tpch benchmark snapshots for federated/dynamodb.yaml
* Fix CH-BenCHmark replication lag metrics calculation (#10836)
* fix: Update tpch benchmark snapshots for federated/glue[catalog].yaml
* fix: Update tpch benchmark snapshots for federated/mssql.yaml
* fix: Update tpch benchmark snapshots for federated/oracle.yaml
* fix: Update tpch benchmark snapshots for federated/oracle[catalog].yaml
* fix: Update tpch benchmark snapshots for federated/glue[parquet].yaml
* fix: Update tpch benchmark snapshots for federated/iceberg[hadoop].yaml
* fix: Update tpch benchmark snapshots for federated/abfs_standard_versioned[parquet].yaml
* fix: Update tpch benchmark snapshots for federated/file[parquet].yaml
* fix: Update tpch benchmark snapshots for federated/scylladb.yaml
* fix: Update tpch benchmark snapshots for federated/mssql[catalog].yaml
* fix: Update tpch benchmark snapshots for accelerated/s3[parquet]-cayenne[file]-partitioned.yaml
* fix: Update tpch benchmark snapshots for accelerated/spicecloud-arrow.yaml
* fix: Update tpch benchmark snapshots for accelerated/indexes/file[parquet]-cayenne[file]-indexes.yaml
* fix: Update tpch benchmark snapshots for accelerated/indexes/file[parquet]-arrow-indexes.yaml
* fix: Update tpch benchmark snapshots for accelerated/s3[parquet]-arrow-partitioned.yaml
* fix: Update tpch benchmark snapshots for accelerated/dynamodb-arrow.yaml
* fix: Update tpch benchmark snapshots for accelerated/mongodb-arrow.yaml
* fix: Update tpch benchmark snapshots for accelerated/s3[parquet]-arrow.yaml
* fix: Update tpch benchmark snapshots for accelerated/dynamodb-duckdb[file].yaml
* fix: Update tpch benchmark snapshots for accelerated/file[parquet]-arrow.yaml
* fix: Update tpch benchmark snapshots for accelerated/file[parquet]-cayenne[file]turso.yaml
* fix: Update tpch benchmark snapshots for accelerated/file[parquet]-cayenne[file].yaml
* fix: Update tpch benchmark snapshots for accelerated/on_zero_results/file[parquet]-cayenne[file]-on_zero_results.yaml
* fix: Update tpch benchmark snapshots for accelerated/on_zero_results/file[parquet]-duckdb[file]-on_zero_results.yaml
* fix: Update tpch benchmark snapshots for accelerated/on_zero_results/file[parquet]-duckdb[memory]-on_zero_results.yaml
* fix: Update tpch benchmark snapshots for accelerated/mysql-arrow.yaml
* fix: Update tpch benchmark snapshots for accelerated/s3[parquet]-duckdb[file]-partitioned.yaml
* fix: Update tpch benchmark snapshots for accelerated/postgres-arrow.yaml
* fix: Update tpcds benchmark snapshots for federated/s3[parquet].yaml
* fix: Update tpcds benchmark snapshots for federated/abfs[parquet].yaml
* fix: Update tpcds benchmark snapshots for federated/file[parquet].yaml
* fix: Update tpcds benchmark snapshots for federated/databricks[delta_lake].yaml
* fix: Update tpcds benchmark snapshots for accelerated/spicecloud-arrow.yaml
* fix: Update tpcds benchmark snapshots for accelerated/databricks[delta_lake]-arrow.yaml
* fix: Update tpcds benchmark snapshots for accelerated/s3[parquet]-arrow-partitioned.yaml
* fix: Update tpcds benchmark snapshots for accelerated/s3[parquet]-arrow.yaml
* fix: Update tpcds benchmark snapshots for accelerated/file[parquet]-arrow.yaml
* fix: Update tpcds benchmark snapshots for accelerated/s3[parquet]-cayenne[file].yaml
* fix: Update tpcds benchmark snapshots for accelerated/on_zero_results/file[parquet]-cayenne[file]-on_zero_results.yaml
* fix: Update tpcds benchmark snapshots for accelerated/file[parquet]-cayenne[file].yaml
* fix: Update tpcds benchmark snapshots for accelerated/on_zero_results/file[parquet]-duckdb[file]-on_zero_results.yaml
* fix: Update tpcds benchmark snapshots for accelerated/on_zero_results/file[parquet]-duckdb[memory]-on_zero_results.yaml
* fix: Update tpcds benchmark snapshots for accelerated/postgres-arrow.yaml
* fix: Update clickbench benchmark snapshots for accelerated/on_zero_results/file[parquet]-cayenne[file]-on_zero_results_small.yaml
* fix: Update tpch benchmark snapshots for accelerated/s3[parquet]-arrow.yaml
* fix: Update tpch benchmark snapshots for accelerated/s3[parquet]-cayenne[file].yaml
* fix: Update tpch benchmark snapshots for accelerated/mongodb-duckdb[file]-changes.yaml
---------
Co-authored-by: Spice Benchmark Snapshot Update Bot <spiceaibot@spice.ai>
Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com>
* Upgrade iceberg to v0.9.1 (#10859)
* Upgrade iceberg to v0.9.1
* Update
* refactor(search): Split DuckDB vector index into submodules (#10751)
* refactor(search): Split DuckDB vector index module into separate files
Break up the monolithic duckdb.rs into focused submodules:
- metric.rs: DuckDBDistanceMetric enum and distance/score expressions
- hnsw.rs: DuckDBHnswOptions and HNSW index SQL generation
- sql.rs: Vector search SQL builders (CTE and flat paths), filter pushdown
- query_exec.rs: DuckDBVectorQueryExec ExecutionPlan implementation
- query_table.rs: DuckDBVectorQueryTable TableProvider implementation
- mod.rs: DuckDBVectorIndex, trait impls, shared utilities, re-exports
Also improves DuckDBVectorQueryExec display to show metric, dims,
scan type (hnsw vs brute_force), and projection columns.
* style: Fix cargo fmt in query_exec.rs
* fix: remove unused imports in search duckdb submodules
* fix: remove trailing garbage bytes from query_exec.rs
* update test for ef_search
* fix: remove unused EMPTY_PROJECTION_ROW_COLUMN import; load vss on query connections
* refactor: inject LOAD vss via pool setup query; RESET hnsw_ef_search after query
* fix: remove redundant SET hnsw_ef_search; ef_search is baked into CREATE INDEX
* fix: rustfmt import ordering in query_exec.rs
* fix(search): pass pool by reference in with_query_context to satisfy clippy::needless_pass_by_value
* refactor: leverage DataFusion TreeNode and split_binary utilities (#10858)
Replace hand-rolled BinaryExpr recursion with existing DataFusion utilities:
- s3_vectors_metadata_filter: collapse supports_filter_expr and its
supports_binary_expr helper into a single Expr::apply traversal that
recurses into AND/OR children and treats validated comparison /
IS NULL / InList nodes as leaves via TreeNodeRecursion::Jump.
- accelerated_table deletion tests: replace the recursive
collect_or_conditions / collect_and_conditions helpers with a direct
call to datafusion::logical_expr::utils::split_binary, which also
handles Expr::Alias transparently.
Co-authored-by: Claude <noreply@anthropic.com>
* feat(cli): add `spice feedback` command to open community Slack (#10856)
Opens https://spice.ai/slack in the user's default browser and prints
the URL as a fallback for headless environments.
Co-authored-by: Claude <noreply@anthropic.com>
* Executors should create catalog tables on join (#10807)
* Executors should create catalog tables on join
* add test 'test_distributed_cayenne_late_join_ddl_replay'
* fix non-existant schema; fix SQL into query pipeline
* comments
* fix: use usize::try_from for since_version cast; add PARTITION BY to late-join DDL replay test
* fix: add backticks around DataFusion in doc comments to satisfy clippy::doc_markdown
---------
Co-authored-by: Jeadie <jeadie@users.noreply.github.com>
* Log TLS SIGHUP handler install at debug (#10848)
Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
* fix: label all columns in `spice cloud metrics` table output (#10784)
The metrics table builder declared six headers but every row added seven
cells, so `disk_write_operations` rendered without a label and the disk
columns were captioned with unrelated names ("DISK USED", "DISK AVAIL",
"DISK CAP"). Re-label so the header count matches the row count and the
labels accurately describe each metric (read/write bytes vs. ops).
Extract `metrics_table_headers()` and `metrics_table_row()` so a unit
test can assert the header count equals the row count and prevent this
regression class from recurring.
Fixes #9989
Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
* fix: Deny nondistributed cayenne catalog (#10821)
* fix: Deny nondistributed cayenne catalog
* review: Address comments
* chore: build
* test: ignore tests which use cayenne catalog non-distributed
* Update mod.rs
---------
Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
* Upgrade Turso to v0.6.0 (#10843)
* Upgrade Turso to v0.6.0
* feat(turso): add support for NULLS FIRST/LAST ordering in queries and implement related test
---------
Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
* Remove rollback from cloud client (#10847)
* Remove rollback method from cloud client
The /v1/apps/{id}/rollback endpoint does not exist in the cloud API.
* remove rollback from CLI
* uncomment
* fix: remove CloudCommands::Rollback references from main.rs
The Rollback variant was removed from CloudCommands in mod.rs but two
references remained in main.rs:
- The is_json_output match arm for CloudCommands::Rollback
- The 'spice cloud rollback --output json' entry in the JSON commands list
* fix: remove extra blank line to satisfy rustfmt
---------
Co-authored-by: Jeadie <jeadie@users.noreply.github.com>
Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
* feat(cluster): per-request executor readiness gate on /v1/ready (#9851) (#10860)
Adds three query parameters to /v1/ready so Kubernetes readiness probes can
gate scheduler readiness on executor availability without restarting the
scheduler:
min_ready_executors=<u32> minimum currently-ready executors
min_ready_executors_percent=<u8> minimum ready/registered ratio (0-100)
verbose=true multi-line diagnostic body
Decision logic: components must be ready (unchanged), then both supplied
gates must pass. A value of 0 disables that gate. Percentage uses the
currently-registered (control stream open) count as denominator so a
dynamic fleet self-adapts (no peak-of-history poisoning, no restart
asymmetry). Supplying a non-zero executor gate against a non-scheduler
runtime returns 400 to surface misconfiguration.
The verbose body mirrors Kubernetes /readyz?verbose, e.g.:
[+] components ok
[-] executors: 2/5 ready (40%, registered=5, min=3, min_percent=80%)
not ready
ExecutorRegistry gains ready_executor_count() and connected_executor_count()
accessors over the existing maps; no new bookkeeping. Routes layer the
optional registry onto the unauthenticated /v1/ready router.
Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
* Adds testoperator dispatch infrastructure for CH-BenCHmark (HTAP) workloads. (#10868)
* Revert "Disable HTTP dynamic request header tests (#10862)" (#10869)
This reverts commit 5c1a3260f6b318470664d4133abfa9b219b90f38.
* Add foreign key metadata discovery to PostgreSQL Catalog (#10849)
* Expose foreign keys on postgres catalog
* fix: forward statistics() in MetadataEnrichedTableProvider; log warning on FK serialization failure
* formatting
* fix: address PR review - use position_in_unique_constraint, simplify FK JSON format
- Fix composite FK column ordering by joining on
kcu1.position_in_unique_constraint instead of ordinal_position
- Simplify FK JSON: merged referenced_schema+referenced_table into
foreign_table, renamed referenced_columns to foreign_columns
- Document overwrite behavior in MetadataEnrichedTableProvider::new
---------
Co-authored-by: Jeadie <jeadie@users.noreply.github.com>
* Refactor offset handling in Kafka and Debezium components
- Introduced a new module `offsets.rs` to centralize offset serialization and deserialization logic.
- Updated `DebeziumKafkaSys` and `KafkaSys` implementations to use the new offsets module for offset operations.
- Changed return types of `get_*` methods in `DebeziumKafkaSys` and `KafkaSys` to return `Result<Option<T>>` for better error handling.
- Enhanced schema management by introducing `OffsetSchemaState` to track schema assurance status.
- Updated database interaction methods to handle potential errors more gracefully.
- Added tests for handling corrupt offsets and missing rows in SQLite.
* Refactor Kafka offset handling and database schema
- Introduced a new table `spice_sys_kafka_offsets` to store offsets per partition, replacing the previous offsets_json field in the main Kafka table.
- Updated PostgreSQL, SQLite, and Turso implementations to handle offsets using the new table structure.
- Implemented functions to upsert offsets in a transactional manner, ensuring that the highest offset is retained in case of concurrent writes.
- Added logic to log warnings when offsets regress, indicating potential issues with upstream producers.
- Refactored offset serialization and deserialization logic, removing unnecessary functions and consolidating offset management.
- Enhanced tests to cover new offset handling scenarios, including concurrent writes and backward offset detection.
* Refactor code formatting in Kafka PostgreSQL and SQLite modules for improved readability
* refactor: streamline code formatting and improve readability across multiple files
* docs: clarify comments on OffsetSchemaState regarding instance scoping and DDL idempotence
* fix: address sidecar lint feedback
* feat(benchmarks): add benchmark for CDC deletion churn performance
---------
Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Phillip LeBlanc <879445+phillipleblanc@users.noreply.github.com>
Co-authored-by: Viktor Yershov <viktor@spice.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Spice Benchmark Snapshot Update Bot <spiceaibot@spice.ai>
Co-authored-by: Jack Eadie <jack@spice.ai>
Co-authored-by: Jeadie <jeadie@users.noreply.github.com>
Co-authored-by: claudespice <claude@spice.ai>
Co-authored-by: William <98815791+peasee@users.noreply.github.com>
fixes #10631) (#10909) `POST /v1/search` was failing with `Schema error: No field named locationid` on datasets whose primary key column used mixed-case (e.g. `LocationID`). Root cause: two call sites built `Column` structs from primary-key strings via `Column::from_qualified_name(...)`, which calls `parse_identifiers_normalized()` and lowercases any unquoted identifier before the plan is built. The schema, however, registers the field with its original case, so DataFusion's projection lookup failed. The case-handling was originally added in #7825 but was removed during the LogicalPlanBuilder refactor in #7968, which replaced `quote_identifier` + SQL-string construction with direct `Column::from_qualified_name(...)`. Fix: switch both sites to `Column::from_name(...)`, which constructs a `Column` with no relation and no identifier normalization — matching the pattern already used in `candidate/vector.rs` and `index/vector_table.rs`. Adds a regression test in `reciprocal_rank.rs` that exercises `additional_columns_of_schema` with a mixed-case schema (`LocationID`, `Borough`) and asserts the column names round-trip without lowercasing.
* fix(cloud): point CLI at the Spice Cloud dashboard auth endpoints `spice cloud login`, `spice cloud whoami` and `spice cloud apps` were all failing because the cloud client hit `/v1/auth/device`, `/v1/auth/context` and related paths that the server no longer exposes. Switch the device flow, code exchange and auth context calls to the working dashboard paths (`/auth/token`, `/auth/token/exchange`, `/api/spice-cli/auth`) that `spice login` already uses, and flatten the new nested `org`/`app` response into the existing `AuthContext` shape via `AuthContextRaw`. The `/v1/apps` list endpoint does not populate `org` per app, so render `org/app` for `spice cloud apps` using the auth-context org as a fallback, and drop the leading slash when no org is resolvable. Fixes #9449, #9450, #9451. * ci: retrigger workflows cancelled by stale pull-with-spice gate * Fix CLI spicepod yml manifest edits * Enhance Spicepod manifest handling and error reporting - Introduced support for both `spicepod.yaml` and `spicepod.yml` filenames. - Improved error messages for invalid Spicepod directories and nested local installs. - Added tests to validate nested local install rejection. * review: avoid Debug on auth exchange types and rename test Address Copilot review feedback: - Drop `Debug` from `AuthExchangeRequest` (the device `code` is exchangeable for an access token, so treat it like a short-lived secret) and from `AuthExchangeResponse` (which holds `access_token`). This matches the existing convention for `OAuthTokenRequest`/`OAuthTokenResponse`. - Rename `auth_url_uses_dashboard_path` to `auth_url_uses_oauth_token_path` so the test name matches the asserted endpoint. * Implement manifest editing commands for Spicepod components - Added a new `component.rs` file to handle editing of Spicepod component sections. - Introduced commands for adding and configuring components, including models, catalogs, and extensions. - Implemented validation for component references and ensured proper handling of YAML mappings. - Added tests for manifest command help and component editing functionalities. - Updated CLI integration tests to cover new manifest editing commands. * Refactor manifest handling and improve validation logic in Spicepod commands * Refactor manifest candidate names initialization and improve readability in tests * Refactor Spicepod manifest header fields for improved serialization * Enhance CLI documentation and command descriptions for improved user guidance * Enhance command documentation across multiple files for improved user guidance * Refactor command documentation for component and singleton sections to improve clarity and usability * Wire per-variant long_about by removing shadowing /// docs clap-derive treats a /// doc on a Subcommand variant as the variant's about and silently overrides any long_about defined on the inner Args struct. Drop the /// docs on every variant whose Args struct already exposes its own #[command(about, long_about)] so the rich help text authored in each command module actually surfaces under 'spice <cmd> --help'. The 11 ComponentArgs variants and the 3 SingletonArgs variants share a single Args type, so they keep their /// docs (for the per-section short about in 'spice --help') and explicitly opt in to the shared long_about via #[command(long_about = component::COMPONENT_LONG_ABOUT)] / SINGLETON_LONG_ABOUT. * Refactor CLI integration tests for improved command help output and consistency * Update CLI integration tests for improved help output and command success assertions * Refactor long help text formatting for singleton, extension, and metadata commands * Fix string formatting in Spicepod YAML test cases for consistency * 'spice cloud create/update app' CLI improvements (#10812) * 'spice cloud create/update app' CLI improvements * make better DX * fix spidapter * fix: replace #[allow] with #[expect], use is_multiple_of, remove needless_update and redundant closure * fix: clippy Copy/Clone lints in spidapter; use tokio::fs for async read_spicepod_file - tools/spidapter/src/commands/mod.rs: channel.cloned() -> channel.copied() (UpdateChannel derives Copy; cloned_instead_of_copied lint) - tools/spidapter/src/stdio_server.rs: args.channel.clone() -> args.channel (redundant clone on Copy type; clone_on_copy lint) - bin/spice/src/commands/cloud/mod.rs: read_spicepod_file() made async, switched from std::fs::read_to_string to tokio::fs::read_to_string; both call sites updated from .map().transpose() to explicit if let + .await (blocking filesystem call on async runtime thread) --------- Co-authored-by: Jeadie <jeadie@users.noreply.github.com> * refactor: clean up error handling in read_spicepod_file function * feat: add programmatic mode for JSON output in CLI commands * fix: improve condition handling for programmatic mode arguments in CLI * feat: enhance SQL command to support direct query execution and improve CLI argument handling * feat: add direct prompt support for chat command and improve argument normalization * feat: update CLI to use '-chat' flag for direct prompts and enhance argument normalization * refactor: update CLI commands and structures for improved argument handling and manifest management * refactor: streamline manifest validation and enhance error handling in CLI * refactor: improve error messages and formatting in manifest validation * refactor: improve field retrieval in manifest validation and simplify access token check in CloudClient * refactor: update memory and channel handling in resource management and app provisioning * refactor: enhance error handling and improve memory management in resource definitions * Address PR review comments - Make NumBytes::to_resource_string infallible (return String) - Skip manifest normalization when local pod source==destination to avoid mutating user's source files - Fix indentation in help text examples (main.rs, sql.rs) - Use explicit yaml: prefix in --set integration tests to match the string-by-default contract of parse_string_or_yaml_prefixed_value * Refactor cloud-related CLI arguments to improve clarity and functionality * Fix lint issues from CLI review updates * Address additional CLI review feedback * Address CLI PR feedback * Address CLI follow-up review feedback * Address remaining CLI review feedback * Address additional CLI review feedback * Disable HTTP dynamic request header tests (#10862) Temporary disable tests due to #10861. * Unpin Python duckdb version in DuckLake CI bootstrap (#10839) * Add `response.output_text.delta` to responses API (#10828) * Add response.output_text.delta to responses API * Lint * Lint --------- Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com> * fix: Require dim-side statistics for `CayennePropagateFilterAcrossEquiJoinKeys` and forward stats through `AcceleratedTable` (#10863) * fix: Preserve comments and formatting in spicepod.yaml editing * Address CLI review feedback: document Nsql machine-mode exclusion and yaml crate rationale * refactor(tests): remove unused cloud rollback test and stabilize output assertions * refactor(cloud): streamline executor creation and enhance error handling in app creation * refactor(cloud): improve error message formatting in app rollback logic refactor(main): simplify conditional statement for argument handling fix(queries): add missing query number in chbench test queries * refactor(cli): adjust formatting and update executor CPU type in cloud commands * fix(cli): address remaining PR feedback * fix(cli): address remaining PR review threads * fix(cli): clarify cloud review edge cases * feat(cloud-client): enhance error handling for authorization denial and improve region normalization * feat(cloud-client): add device authorization denied message and error handling * fix: update error handling for device authorization and improve response validation --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Jack Eadie <jack@spice.ai> Co-authored-by: Jeadie <jeadie@users.noreply.github.com> Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com> Co-authored-by: Viktor Yershov <viktor@spice.ai>
* feat: cross-connector schema_evolution dataset config
Introduce SchemaEvolution::{Block,Detect,Evolve} as a top-level Dataset
field (default: Block). Connectors declare supported modes via a new
DataConnector::supported_schema_evolution_modes trait method, defaulting
to [Block, Detect] (passive refresh-time mismatch detection works for
every connector); Evolve is opt-in for v2.1.
- Add SchemaEvolution enum + Dataset.schema_evolution in spicepod
- Mirror through runtime Dataset, DatasetBuilder, PartialEq/Debug
- Validate dataset.schema_evolution against connector support at
registration; emit SchemaEvolutionNotSupported on mismatch
- Forward the new trait method through EmbeddingConnector,
FullTextConnector, ElasticsearchFullText, DeferredConnector wrappers
- Debezium: read mode from dataset, keep deprecated bool 'schema_evolution'
connector param as back-compat with warning; map true->Detect,
false->Block; return NotImplemented for Evolve
- copilot-instructions: add 'Configuration & User-Facing Parameters'
section banning booleans in user-facing config and documenting the
enum + per-connector opt-in pattern
* fix: Update schema evolution documentation and defaults in dataset and connectors
* fix: Refactor error imports in dataset module for improved readability
* refactor: Replace SchemaEvolution with OnSchemaChange for improved schema handling
* fix: Update formatting in copilot instructions for improved clarity
* feat: tune accelerators by storage profile * feat: enhance async support and improve error handling in data accelerators * refactor: simplify async connection handling in apply_sqlite_pragmas function * refactor: improve documentation and formatting in data accelerator modules
… statements and terminal partitioning (#10906) * Add terminals override and scale-factor-independent configs for CH-benCHmark **Changes:** - Add `--terminals` override support for CH-benCHmark - Decouple spicepod configs from scale factor — move from `chbench/sf1/` to `chbench/` root so the same configs work across SF1, SF100, SF1000 - Add explicit `shared_buffers=128MB` to CH-benCH PostgreSQL setup - Increase `max_connections` to 200 for headroom with higher terminal counts * Faster load * New Order and Payment as prepared statements * Terminal assignments * Update * Update
…10911) * Authorization header + Bump async-openai * Lint * Fix for responses_adapter
Pull in spiceai/datafusion-ballista#41, which builds the shuffle reader's S3 object store client via `AmazonS3Builder::from_env()` instead of going through `RuntimeEnv::object_store(...)`. The runtime registry only holds dataset object stores in embedded deployments, so for shuffle buckets it returned a default unauthenticated client and reads failed with 403 AccessDenied even though the writer had uploaded the partition. With this bump, both ends of the S3 shuffle pipeline pick up credentials from the same place (env vars on the executor pod) and async queries that use `shuffle_location: s3://...` complete.
Co-authored-by: Spice Schema Bot <schema-bot@spice.ai> Co-authored-by: claudespice <claude@spice.ai>
* Set DuckDB TimeZone=UTC for partitioned accelerator The TimeZone setting from #7797 was only applied to the non-partitioned DuckDB accelerator path. Partitioned DuckDB tables were missing this setting, causing DuckDB to return timestamps with the local system timezone (e.g. Asia/Seoul) instead of UTC. This is a contributing factor to #10892. * Centralize DuckDB factory settings into create_duckdb_factory() helper Add a shared create_duckdb_factory() function in duckdb.rs that configures all standard DuckDB settings (dialect, TimeZone=UTC, OrderByNonIntegerLiteral, IndexScanPercentage, IndexScanMaxCount, and function deny-list). All consumers now call this helper instead of constructing DuckDBTableProviderFactory independently. This fixes partitioned DuckDB paths missing IndexScanPercentage and IndexScanMaxCount settings, and prevents future setting divergence. * Rename create_duckdb_factory() to create_factory() and remove wrapper functions Remove pointless indirection - consumers now call duckdb::create_factory() directly instead of local wrapper functions that just delegated to it. * fix * fix formatting * Apply suggestion from @ewgenius * Fix clippy doc_markdown lint for DuckDB in doc comment --------- Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com> Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
* Optimize Cayenne catalog maintenance paths * refactor(cayenne): replace `allow` with `expect` for Clippy lints and update function signatures to return `()` where applicable * fix(cayenne): satisfy lint after review updates * refactor(cayenne): update terminology from RPC to metastore call in benchmarks and documentation * Update vortex dependencies to revision 86cc337 for consistency across crates * feat(cayenne): add maintenance compaction triggers for protected snapshots and implement benchmark for cache invalidation * feat(cayenne): add compaction trigger snapshot age configuration and related logging * test(cayenne): add unit test for resolving compaction thresholds from acceleration parameters * fix(cayenne): update PARAMETERS array length to include additional S3 parameter * feat(cayenne): implement cache invalidation for scan listing tables and add integration test for snapshot publishing * refactor(cayenne): improve comments and update function signatures for clarity * feat(benchmarks): add benchmark for load_existing_keyset_cap_disabled * feat(runtime): add filter propagation option and related parsing function feat(telemetry): implement tracking for Cayenne write phase durations refactor(cayenne): streamline keyset handling and improve performance * Enhance VortexConfig and optimize staging file moves - Updated VortexConfig documentation to clarify runtime parameter effects on compaction and inline write settings. - Introduced a function to apply refresh mode defaults to VortexConfig based on the acceleration mode. - Adjusted default values for compaction and inline write parameters for append and changes refresh modes. - Improved performance of S3 staged-file moves by implementing concurrent processing, reducing the time readers are blocked during file operations. - Refactored benchmarks to reflect changes in the staging move concurrency and inline memtable read overhead. - Added tests to verify correct application of VortexConfig defaults based on refresh mode. * refactor(cayenne): update VortexConfig documentation and refresh mode handling * refactor(benches): improve documentation and clarify inline upsert benchmark details * refactor(cayenne): enhance VortexConfig documentation and improve small write handling * feat(cayenne): add compaction_trigger_protected_snapshots to VortexConfig and related functions * refactor: optimize max_sequence_number validation in DeletionIndex and KeyDeletionIndex * feat(cayenne): add scaling benchmark for upsert performance against DuckDB * refactor: improve comments and optimize keyset cache handling in Cayenne * refactor: enhance error handling and improve comments in Cayenne components * refactor: improve code clarity and consistency in catalog and deletion index implementations * refactor: enhance transaction handling and add query_row_values method for metastore * refactor: update comments to clarify byte-budget cap impact on keyset rebuilding * refactor: enhance comments and improve clarity in benchmark and catalog implementations * refactor: initialize raw field in CachedTableStatistics for first load/persist * refactor: update keyset cache handling to use byte budget and improve delete file validation * refactor: update keyset cache handling to use byte budget and improve delete file validation * refactor: simplify delete file validation and improve code clarity in multiple files * refactor: enhance keyset cache handling and improve parameter parsing with semantic hints * refactor: enhance parameter handling for snapshot age configuration in Cayenne * refactor: add benchmark for int64 primary key filter keep mask allocation and improve snapshot age parsing logic * refactor: add max coalesce age configuration to CDC and update documentation * refactor: implement batched multi-VALUES INSERT for delete-file rows with ON CONFLICT handling * refactor: clean up formatting and improve readability in delete file insertion logic * refactor: add benchmark for position-based deletion redundant walks * refactor: remove redundant parse_u64 function and streamline parsing logic * test: add unit test for batched on-conflict deletions in CayenneCatalog * refactor: improve comments and error handling in various modules * refactor: simplify visibility checks in deletion filter execution * refactor: enhance transaction handling in SqliteMetastore and update conflict deletion logic * fix: correct type in insert_pks vector creation for delete file tests * feat: add benchmark for apply_partial_deletion_filter to measure allocation efficiency * feat: add default on_schema_change handling in dataset configuration * feat: add benchmark for pk_lookup_file_group_fanout to evaluate partition sensitivity * refactor: simplify batch query in benchmark for target partitions * feat: add benchmark for pk_lookup_session_cache_warmup to evaluate cache performance * feat: add benchmarks for pk_in_list_vs_range_rewrite and get_max_delete_sequence_walk * refactor: streamline SQL query formatting and add async await to vortex config calls * perf: optimize get_max_delete_sequence by using cached max_sequence_number
* Rc5 release notes * feat: enhance v2.0.0-rc.5 release notes with new features and improvements - Complete mTLS implementation across server endpoints and outbound connectors. - Add support for PostgreSQL DML operations (INSERT, UPDATE, DELETE). - Introduce Snowflake DML support for write-back operations. - Implement MongoDB Change Streams for real-time CDC without Debezium or Kafka. - Promote DuckLake to Beta with INSERT support on catalog tables. - Add user-defined functions (UDFs) and remote UDFs over HTTP. - Enable on-demand dataset loading for deferred datasets. - Replace external smb crate with an internal SMB 3.1.1 client. - Implement unified query cancellation across all execution paths. - Enhance dynamic HTTP connector with request headers, subquery-driven params, and JSON schema decomposition. - Introduce provider-aware LLM prompt caching and searchable registry mode for LLM tools. - Add `refresh_mode: snapshot` for point-in-time snapshot acceleration. - Improve Responses API support across all model providers. - Persist HTTP rate-control state in object storage for consistency across restarts. - Implement various performance improvements and bug fixes in Cayenne and CDC processing. * docs(release): update v2.0.0-rc.5 notes with latest trunk PRs * docs(release): reorder v2.0.0-rc.5 highlights and What's New sections * docs(release): promote Spice Cayenne to first in highlights and What's New * docs(release): refine v2.0.0-rc.5 release notes by removing redundant sections and enhancing clarity * better docs * Revert "better docs" This reverts commit bc9b5d9. * docs(release): update v2.0.0-rc.5 release notes with new features and improvements --------- Co-authored-by: Jeadie <jeadie@users.noreply.github.com> Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )