[pull] trunk from spiceai:trunk by pull[bot] · Pull Request #639 · TheRakeshPurohit/spiceai

pull · 2026-02-26T03:06:15Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* feat: Return dataset error message in datasets API * fix: Use errors for component status where available * fix build * fix build * fix build * fix build * fix: handle error when converting desired discriminant to u8 in tests * fix(api): align status openapi schema and stabilize datasets CSV --------- Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>

* `spidapter stdio --image-tag ... --channel ... ` and standardise spice cloud client (#9436) * Standardise spice cloud client Entire-Checkpoint: 5a6987dbc11e * add image_tag to spidapter Entire-Checkpoint: f5ce118fa528 * timeout errors Entire-Checkpoint: 419914dcd6a9 * add image name to spidapter Entire-Checkpoint: ffdc9b42b0e4 * update_channel Entire-Checkpoint: 1b196b1fe2c7 * fix and lint * fix bad merge * fix build * feat(spidapter): add Iceberg Glue catalog, DDL table creation, and teardown cleanup (#9443) - Generate spicepod with Iceberg Glue catalog (AWS account 211125479522) as the default catalog override (name: spice, access: read_write_create) - Implement create_tables RPC: generate CREATE TABLE IF NOT EXISTS DDL from Arrow schemas and execute via /v1/sql endpoint - Implement teardown cleanup: DROP TABLE IF EXISTS for all created tables before deleting the SCP app - Add Arrow-to-SQL type mapping for DDL generation - Track cname and created table names in RunState for teardown * feat: Add Iceberg delete support via equality delete files * Spidapter: fix iceberg catalog configuration 2026-02-19T15:32:55.128370Z ERROR runtime::init::catalog: Failed to initialize catalog connector: Cannot setup the catalog spice (iceberg) with an invalid configuration. A Catalog Path is required for Iceberg in the format of: http://<host_and_port>/v1/namespaces/<namespace>. For details, visit: https://spiceai.org/docs/components/catalogs/iceberg#from Path must contain 'namespaces' segment * feat: Per-table acceleration via CREATE TABLE ... WITH ("acceleration.*") (#9446) * feat: Per-table acceleration via CREATE TABLE ... WITH ("acceleration.*") Implements per-table acceleration for Iceberg DDL tables using SQL syntax: CREATE TABLE orders (id INT) WITH ( "acceleration.engine" = 'arrow', "acceleration.mode" = 'memory', "acceleration.refresh_check_interval" = '10s' ); Key changes: SQL parsing layer: - acceleration_options.rs: AccelerationOptionsStore + parse_acceleration_options() - preprocess.rs: Strips WITH clause before DataFusion planning, stores options - Keys use "acceleration." prefix (double-quoted for dot in SQL identifiers) Logical/physical plan threading: - IcebergCreateTableNode carries Option<Acceleration> - Analyzer rule retrieves options from shared store - Planner passes Weak<DataFusion> to physical plans AcceleratedTable creation: - create_accelerated_iceberg_table() in physical_plans.rs - Converts spicepod Acceleration → runtime Acceleration via TryFrom - Creates accelerator engine table (Arrow/DuckDB/SQLite) - Builds AcceleratedTable with refresh config - Graceful fallback to raw Iceberg reads on failure Runtime wiring: - SharedDataFusionRef (Arc<OnceLock<Weak<DataFusion>>>) avoids circular deps - Set via DataFusion::set_self_ref() after Arc::new() - Extension planner and physical plans use Weak<DataFusion> Drop cleanup: - IcebergDropTableExec deregisters table first (triggers Drop → abort handlers) - Then drops from Iceberg catalog 13 unit tests for option parsing, preprocessing, and store lifecycle. * feat: Update copyright year and enhance acceleration option handling in DDL * feat: Add Iceberg delete support via equality delete files * feat: Simplify store key assignment in preprocess_create_table_acceleration function * Update spidapter for new system-adapter-protocol (#9442) * `spidapter stdio --image-tag ... --channel ... ` and standardise spice cloud client (#9436) * Standardise spice cloud client Entire-Checkpoint: 5a6987dbc11e * add image_tag to spidapter Entire-Checkpoint: f5ce118fa528 * timeout errors Entire-Checkpoint: 419914dcd6a9 * add image name to spidapter Entire-Checkpoint: ffdc9b42b0e4 * update_channel Entire-Checkpoint: 1b196b1fe2c7 * fix and lint * fix bad merge * fix build * refactor: Clean up formatting and error handling in Iceberg DDL and Spidapter code * refactor: Simplify SQL type mapping for Arrow data types in create_table_ddl function * feat: Per-table acceleration via CREATE TABLE ... WITH ("acceleration.*") Implements per-table acceleration for Iceberg DDL tables using SQL syntax: CREATE TABLE orders (id INT) WITH ( "acceleration.engine" = 'arrow', "acceleration.mode" = 'memory', "acceleration.refresh_check_interval" = '10s' ); Key changes: SQL parsing layer: - acceleration_options.rs: AccelerationOptionsStore + parse_acceleration_options() - preprocess.rs: Strips WITH clause before DataFusion planning, stores options - Keys use "acceleration." prefix (double-quoted for dot in SQL identifiers) Logical/physical plan threading: - IcebergCreateTableNode carries Option<Acceleration> - Analyzer rule retrieves options from shared store - Planner passes Weak<DataFusion> to physical plans AcceleratedTable creation: - create_accelerated_iceberg_table() in physical_plans.rs - Converts spicepod Acceleration → runtime Acceleration via TryFrom - Creates accelerator engine table (Arrow/DuckDB/SQLite) - Builds AcceleratedTable with refresh config - Graceful fallback to raw Iceberg reads on failure Runtime wiring: - SharedDataFusionRef (Arc<OnceLock<Weak<DataFusion>>>) avoids circular deps - Set via DataFusion::set_self_ref() after Arc::new() - Extension planner and physical plans use Weak<DataFusion> Drop cleanup: - IcebergDropTableExec deregisters table first (triggers Drop → abort handlers) - Then drops from Iceberg catalog 13 unit tests for option parsing, preprocessing, and store lifecycle. * refactor: Improve snapshot creation configuration handling for caching mode --------- Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com> Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com> Co-authored-by: Jack Eadie <jack@spice.ai> * Fix cloud client construction and finalize lint-safe test updates * refactor: simplify error handling in CloudClient and optimize SQL type mapping in stdio_server * feat(spidapter): add cayenne acceleration to CREATE TABLE DDL (#9453) Tables created by spidapter now include WITH clause options for per-table acceleration using the cayenne engine with full refresh every 1 second, file mode. * fix: coerce nanosecond timestamps to microsecond for Iceberg v2 (#9454) Iceberg v2 does not support timestamp_ns. DataFusion SQL parser maps TIMESTAMP to Timestamp(Nanosecond, ...) by default. This coerces nanosecond timestamp fields to microsecond precision before converting the Arrow schema to Iceberg schema during CREATE TABLE. * fix: lint fixes for spicebench branch * fix: derive table location from namespace for Iceberg DDL CREATE TABLE (#9455) AWS Glue catalogs require an explicit `location` when creating tables. Previously, `TableCreation` was built without a location, causing Glue to reject the request with "Cannot parse missing string: location". This change: - Fetches namespace properties via `catalog.get_namespace()` and looks for a `location` property to derive `{namespace_location}/{table_name}` - Passes the derived location using `.location_opt()` on the builder - Gracefully falls back to `None` when namespace has no location (e.g. local REST catalogs that auto-generate locations) - Uses `TableReference::bare` for accelerated table dataset names, matching the convention used by the normal dataset acceleration path * spidapter: disable tables acceleration * feat: support dataset.time_column and dataset.time_format in CREATE TABLE WITH options (#9458) Add support for 'dataset.time_column' and 'dataset.time_format' options in CREATE TABLE ... WITH (...) DDL statements. These dataset-level options affect acceleration behavior for append-mode refreshes by specifying which column contains timestamps and what format they use. Example: CREATE TABLE orders (id INT, created_at TIMESTAMP) WITH ( "acceleration.engine" = 'arrow', "acceleration.refresh_mode" = 'append', "dataset.time_column" = 'created_at', "dataset.time_format" = 'timestamp' ) Changes: - Broaden AccelerationOptionsStore to DdlOptionsStore, storing DdlTableOptions which bundles both acceleration and dataset options - Add DatasetOptions struct with time_column and time_format fields - Add parse_dataset_options() and parse_ddl_table_options() functions - Support all TimeFormat variants: timestamp, timestamptz, unix_seconds, unix_millis, ISO8601, date - Thread DatasetOptions through analyzer rule, logical nodes, planner, and physical plans - Wire time_column and time_format into the Refresh config when creating accelerated Iceberg tables - Rename preprocess function and cleanup helper to reflect broader scope - Add comprehensive unit tests for parsing and preprocessing * [Spicebench] Add refresh processed records/bytes metric (#9459) * Add refresh processed records/bytes metric * Fix * Fix lint * refactor: simplify code formatting and remove unnecessary line breaks * fix: update DatasetOptions default value in IcebergDdlAnalyzerRule * fix: update Anthropic model version in AI UDF tests * fix: route accelerated table inserts to federated source When inserting into an accelerated table, writes now go to the federated source instead of dual-writing to both the accelerator and federated source. The acceleration refresh mechanism picks up the new data on its next cycle. The write_to_accelerator_only path (used when on_conflict is configured) is preserved for backward compatibility. * Revert "spidapter: disable tables acceleration #9456" This reverts commit 9a4564d, reversing changes made to 504c76b. * fix: Set replicas to 4 * feat: add ETL sink mode support for Iceberg object store in spidapter * Jeadie/26 02 19/metrics 2 (#9462) * support GET /v1/apps/{}/metrics Entire-Checkpoint: 1450f2e74e01 * fix * fix into spidapter * add ingestion Entire-Checkpoint: fb63eb40697f * make ingestion metrics great again * fix metrics when not exist * fix divide by 0 * update spidapter handler Entire-Checkpoint: 36647632c2b9 * fix spidapter Entire-Checkpoint: e41839760b98 * use correct from * no AWS vars * use spicepod crate in spidapter, enable hive partitioning Entire-Checkpoint: cb023f99de09 * refactor: streamline spicepod imports and set default telemetry configuration * fix: address PR review issues - Fix get_app comparison bug: org == org -> app.org == org - Pin system-adapter-protocol to full SHA (2153680f3c42bd66632fd3f180016c0d5a984d64) - Fix clippy: use then_some instead of then(|| ...) in metrics - Fix clippy: remove unnecessary Result from SetupConfig::from_metadata - Update DatasetConfig in tests for new system-adapter-protocol fields - Format code * fix: handle early completion of write operation in data sink and drain remaining messages * feat: Add Cayenne Catalog with DDL (#9473) * support cayenne catalog with DDL * update * fix: Remove s3 one zone catalogs, force simple cayenne local catalog * feat: Update spidapter to generate with Cayenne Catalog entries (#9475) * chore: fmt * chore: clippy * fix: Disable Cayenne Catalog namespace replication to public and default (#9479) * fix: Disable Cayenne Catalog namespace replication to public and default * revert spidapter * chore: fmt * fix: Preserve schema in results cache for empty query results (#9484) When a query returns 0 rows, the record batch stream may produce zero RecordBatch items. Previously, CachedQueryResult::from_batches() and new_raw() derived the schema from batches[0].schema(), falling back to Schema::empty() when batches was empty — losing the real schema. This caused FlightSQL ADBC clients to fail with 'inconsistent schema' errors because GetFlightInfo reported the correct schema (from the logical plan) while DoGet returned 0 fields (from the cache). Fix: Accept an explicit schema parameter in from_batches() and new_raw() so the correct schema is always preserved, and pass the stream/plan schema from all call sites. Fixes #9481 * feat: spidapter supports setting up Cayenne catalog with ADBC sink (#9488) * fix: spidapter build * feat: Cayenne catalog supports CREATE SCHEMA (#9489) * style: Refactor code formatting for improved readability across multiple files * refactor: Update comments and improve schema references in multiple files * feat: Enable write-only mode for internal accelerated table builder --------- Co-authored-by: Jack Eadie <jack@spice.ai> Co-authored-by: Phillip LeBlanc <phillip@spice.ai> Co-authored-by: Phillip LeBlanc <phillip@leblanc.tech> Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com> Co-authored-by: Viktor Yershov <viktor@spice.ai> Co-authored-by: peasee <98815791+peasee@users.noreply.github.com>

peasee and others added 2 commits February 25, 2026 22:47

pull Bot locked and limited conversation to collaborators Feb 26, 2026

pull Bot added the ⤵️ pull label Feb 26, 2026

pull Bot merged commit 255a884 into TheRakeshPurohit:trunk Feb 26, 2026
1 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] trunk from spiceai:trunk#639

[pull] trunk from spiceai:trunk#639
pull[bot] merged 2 commits into
TheRakeshPurohit:trunkfrom
spiceai:trunk

pull Bot commented Feb 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pull Bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pull Bot commented Feb 26, 2026 •

edited

Loading