Skip to content

[pull] trunk from spiceai:trunk#639

Merged
pull[bot] merged 2 commits into
TheRakeshPurohit:trunkfrom
spiceai:trunk
Feb 26, 2026
Merged

[pull] trunk from spiceai:trunk#639
pull[bot] merged 2 commits into
TheRakeshPurohit:trunkfrom
spiceai:trunk

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented Feb 26, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

peasee and others added 2 commits February 25, 2026 22:47
* feat: Return dataset error message in datasets API

* fix: Use errors for component status where available

* fix build

* fix build

* fix build

* fix build

* fix: handle error when converting desired discriminant to u8 in tests

* fix(api): align status openapi schema and stabilize datasets CSV

---------

Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
* `spidapter stdio --image-tag ... --channel ... ` and standardise spice cloud client (#9436)

* Standardise spice cloud client

Entire-Checkpoint: 5a6987dbc11e

* add image_tag to spidapter

Entire-Checkpoint: f5ce118fa528

* timeout errors

Entire-Checkpoint: 419914dcd6a9

* add image name to spidapter

Entire-Checkpoint: ffdc9b42b0e4

* update_channel

Entire-Checkpoint: 1b196b1fe2c7

* fix and lint

* fix bad merge

* fix build

* feat(spidapter): add Iceberg Glue catalog, DDL table creation, and teardown cleanup (#9443)

- Generate spicepod with Iceberg Glue catalog (AWS account 211125479522)
  as the default catalog override (name: spice, access: read_write_create)
- Implement create_tables RPC: generate CREATE TABLE IF NOT EXISTS DDL
  from Arrow schemas and execute via /v1/sql endpoint
- Implement teardown cleanup: DROP TABLE IF EXISTS for all created tables
  before deleting the SCP app
- Add Arrow-to-SQL type mapping for DDL generation
- Track cname and created table names in RunState for teardown

* feat: Add Iceberg delete support via equality delete files

* Spidapter: fix iceberg catalog configuration

2026-02-19T15:32:55.128370Z ERROR runtime::init::catalog: Failed to initialize catalog connector: Cannot setup the catalog spice (iceberg) with an invalid configuration. A Catalog Path is required for Iceberg in the format of: http://<host_and_port>/v1/namespaces/<namespace>. For details, visit: https://spiceai.org/docs/components/catalogs/iceberg#from Path must contain 'namespaces' segment

* feat: Per-table acceleration via CREATE TABLE ... WITH ("acceleration.*") (#9446)

* feat: Per-table acceleration via CREATE TABLE ... WITH ("acceleration.*")

Implements per-table acceleration for Iceberg DDL tables using SQL syntax:

  CREATE TABLE orders (id INT) WITH (
    "acceleration.engine" = 'arrow',
    "acceleration.mode" = 'memory',
    "acceleration.refresh_check_interval" = '10s'
  );

Key changes:

SQL parsing layer:
- acceleration_options.rs: AccelerationOptionsStore + parse_acceleration_options()
- preprocess.rs: Strips WITH clause before DataFusion planning, stores options
- Keys use "acceleration." prefix (double-quoted for dot in SQL identifiers)

Logical/physical plan threading:
- IcebergCreateTableNode carries Option<Acceleration>
- Analyzer rule retrieves options from shared store
- Planner passes Weak<DataFusion> to physical plans

AcceleratedTable creation:
- create_accelerated_iceberg_table() in physical_plans.rs
- Converts spicepod Acceleration → runtime Acceleration via TryFrom
- Creates accelerator engine table (Arrow/DuckDB/SQLite)
- Builds AcceleratedTable with refresh config
- Graceful fallback to raw Iceberg reads on failure

Runtime wiring:
- SharedDataFusionRef (Arc<OnceLock<Weak<DataFusion>>>) avoids circular deps
- Set via DataFusion::set_self_ref() after Arc::new()
- Extension planner and physical plans use Weak<DataFusion>

Drop cleanup:
- IcebergDropTableExec deregisters table first (triggers Drop → abort handlers)
- Then drops from Iceberg catalog

13 unit tests for option parsing, preprocessing, and store lifecycle.

* feat: Update copyright year and enhance acceleration option handling in DDL

* feat: Add Iceberg delete support via equality delete files

* feat: Simplify store key assignment in preprocess_create_table_acceleration function

* Update spidapter for new system-adapter-protocol (#9442)

* `spidapter stdio --image-tag ... --channel ... ` and standardise spice cloud client (#9436)

* Standardise spice cloud client

Entire-Checkpoint: 5a6987dbc11e

* add image_tag to spidapter

Entire-Checkpoint: f5ce118fa528

* timeout errors

Entire-Checkpoint: 419914dcd6a9

* add image name to spidapter

Entire-Checkpoint: ffdc9b42b0e4

* update_channel

Entire-Checkpoint: 1b196b1fe2c7

* fix and lint

* fix bad merge

* fix build

* refactor: Clean up formatting and error handling in Iceberg DDL and Spidapter code

* refactor: Simplify SQL type mapping for Arrow data types in create_table_ddl function

* feat: Per-table acceleration via CREATE TABLE ... WITH ("acceleration.*")

Implements per-table acceleration for Iceberg DDL tables using SQL syntax:

  CREATE TABLE orders (id INT) WITH (
    "acceleration.engine" = 'arrow',
    "acceleration.mode" = 'memory',
    "acceleration.refresh_check_interval" = '10s'
  );

Key changes:

SQL parsing layer:
- acceleration_options.rs: AccelerationOptionsStore + parse_acceleration_options()
- preprocess.rs: Strips WITH clause before DataFusion planning, stores options
- Keys use "acceleration." prefix (double-quoted for dot in SQL identifiers)

Logical/physical plan threading:
- IcebergCreateTableNode carries Option<Acceleration>
- Analyzer rule retrieves options from shared store
- Planner passes Weak<DataFusion> to physical plans

AcceleratedTable creation:
- create_accelerated_iceberg_table() in physical_plans.rs
- Converts spicepod Acceleration → runtime Acceleration via TryFrom
- Creates accelerator engine table (Arrow/DuckDB/SQLite)
- Builds AcceleratedTable with refresh config
- Graceful fallback to raw Iceberg reads on failure

Runtime wiring:
- SharedDataFusionRef (Arc<OnceLock<Weak<DataFusion>>>) avoids circular deps
- Set via DataFusion::set_self_ref() after Arc::new()
- Extension planner and physical plans use Weak<DataFusion>

Drop cleanup:
- IcebergDropTableExec deregisters table first (triggers Drop → abort handlers)
- Then drops from Iceberg catalog

13 unit tests for option parsing, preprocessing, and store lifecycle.

* refactor: Improve snapshot creation configuration handling for caching mode

---------

Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com>
Co-authored-by: Jack Eadie <jack@spice.ai>

* Fix cloud client construction and finalize lint-safe test updates

* refactor: simplify error handling in CloudClient and optimize SQL type mapping in stdio_server

* feat(spidapter): add cayenne acceleration to CREATE TABLE DDL (#9453)

Tables created by spidapter now include WITH clause options for
per-table acceleration using the cayenne engine with full refresh
every 1 second, file mode.

* fix: coerce nanosecond timestamps to microsecond for Iceberg v2 (#9454)

Iceberg v2 does not support timestamp_ns. DataFusion SQL parser maps
TIMESTAMP to Timestamp(Nanosecond, ...) by default. This coerces
nanosecond timestamp fields to microsecond precision before converting
the Arrow schema to Iceberg schema during CREATE TABLE.

* fix: lint fixes for spicebench branch

* fix: derive table location from namespace for Iceberg DDL CREATE TABLE (#9455)

AWS Glue catalogs require an explicit `location` when creating tables.
Previously, `TableCreation` was built without a location, causing Glue
to reject the request with "Cannot parse missing string: location".

This change:
- Fetches namespace properties via `catalog.get_namespace()` and looks
  for a `location` property to derive `{namespace_location}/{table_name}`
- Passes the derived location using `.location_opt()` on the builder
- Gracefully falls back to `None` when namespace has no location
  (e.g. local REST catalogs that auto-generate locations)
- Uses `TableReference::bare` for accelerated table dataset names,
  matching the convention used by the normal dataset acceleration path

* spidapter: disable tables acceleration

* feat: support dataset.time_column and dataset.time_format in CREATE TABLE WITH options (#9458)

Add support for 'dataset.time_column' and 'dataset.time_format' options in
CREATE TABLE ... WITH (...) DDL statements. These dataset-level options affect
acceleration behavior for append-mode refreshes by specifying which column
contains timestamps and what format they use.

Example:
  CREATE TABLE orders (id INT, created_at TIMESTAMP) WITH (
    "acceleration.engine" = 'arrow',
    "acceleration.refresh_mode" = 'append',
    "dataset.time_column" = 'created_at',
    "dataset.time_format" = 'timestamp'
  )

Changes:
- Broaden AccelerationOptionsStore to DdlOptionsStore, storing DdlTableOptions
  which bundles both acceleration and dataset options
- Add DatasetOptions struct with time_column and time_format fields
- Add parse_dataset_options() and parse_ddl_table_options() functions
- Support all TimeFormat variants: timestamp, timestamptz, unix_seconds,
  unix_millis, ISO8601, date
- Thread DatasetOptions through analyzer rule, logical nodes, planner, and
  physical plans
- Wire time_column and time_format into the Refresh config when creating
  accelerated Iceberg tables
- Rename preprocess function and cleanup helper to reflect broader scope
- Add comprehensive unit tests for parsing and preprocessing

* [Spicebench] Add refresh processed records/bytes metric (#9459)

* Add refresh processed records/bytes metric

* Fix

* Fix lint

* refactor: simplify code formatting and remove unnecessary line breaks

* fix: update DatasetOptions default value in IcebergDdlAnalyzerRule

* fix: update Anthropic model version in AI UDF tests

* fix: route accelerated table inserts to federated source

When inserting into an accelerated table, writes now go to the federated
source instead of dual-writing to both the accelerator and federated
source. The acceleration refresh mechanism picks up the new data on its
next cycle.

The write_to_accelerator_only path (used when on_conflict is configured)
is preserved for backward compatibility.

* Revert "spidapter: disable tables acceleration #9456"

This reverts commit 9a4564d, reversing
changes made to 504c76b.

* fix: Set replicas to 4

* feat: add ETL sink mode support for Iceberg object store in spidapter

* Jeadie/26 02 19/metrics 2 (#9462)

* support GET /v1/apps/{}/metrics

Entire-Checkpoint: 1450f2e74e01

* fix

* fix into spidapter

* add ingestion

Entire-Checkpoint: fb63eb40697f

* make ingestion metrics great again

* fix metrics when not exist

* fix divide by 0

* update spidapter handler

Entire-Checkpoint: 36647632c2b9

* fix spidapter

Entire-Checkpoint: e41839760b98

* use correct from

* no AWS vars

* use spicepod crate in spidapter, enable hive partitioning

Entire-Checkpoint: cb023f99de09

* refactor: streamline spicepod imports and set default telemetry configuration

* fix: address PR review issues

- Fix get_app comparison bug: org == org -> app.org == org
- Pin system-adapter-protocol to full SHA (2153680f3c42bd66632fd3f180016c0d5a984d64)
- Fix clippy: use then_some instead of then(|| ...) in metrics
- Fix clippy: remove unnecessary Result from SetupConfig::from_metadata
- Update DatasetConfig in tests for new system-adapter-protocol fields
- Format code

* fix: handle early completion of write operation in data sink and drain remaining messages

* feat: Add Cayenne Catalog with DDL (#9473)

* support cayenne catalog with DDL

* update

* fix: Remove s3 one zone catalogs, force simple cayenne local catalog

* feat: Update spidapter to generate with Cayenne Catalog entries (#9475)

* chore: fmt

* chore: clippy

* fix: Disable Cayenne Catalog namespace replication to public and default (#9479)

* fix: Disable Cayenne Catalog namespace replication to public and default

* revert spidapter

* chore: fmt

* fix: Preserve schema in results cache for empty query results (#9484)

When a query returns 0 rows, the record batch stream may produce zero
RecordBatch items. Previously, CachedQueryResult::from_batches() and
new_raw() derived the schema from batches[0].schema(), falling back to
Schema::empty() when batches was empty — losing the real schema.

This caused FlightSQL ADBC clients to fail with 'inconsistent schema'
errors because GetFlightInfo reported the correct schema (from the
logical plan) while DoGet returned 0 fields (from the cache).

Fix: Accept an explicit schema parameter in from_batches() and new_raw()
so the correct schema is always preserved, and pass the stream/plan
schema from all call sites.

Fixes #9481

* feat: spidapter supports setting up Cayenne catalog with ADBC sink (#9488)

* fix: spidapter build

* feat: Cayenne catalog supports CREATE SCHEMA (#9489)

* style: Refactor code formatting for improved readability across multiple files

* refactor: Update comments and improve schema references in multiple files

* feat: Enable write-only mode for internal accelerated table builder

---------

Co-authored-by: Jack Eadie <jack@spice.ai>
Co-authored-by: Phillip LeBlanc <phillip@spice.ai>
Co-authored-by: Phillip LeBlanc <phillip@leblanc.tech>
Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com>
Co-authored-by: Viktor Yershov <viktor@spice.ai>
Co-authored-by: peasee <98815791+peasee@users.noreply.github.com>
@pull pull Bot locked and limited conversation to collaborators Feb 26, 2026
@pull pull Bot added the ⤵️ pull label Feb 26, 2026
@pull pull Bot merged commit 255a884 into TheRakeshPurohit:trunk Feb 26, 2026
1 of 9 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants