[pull] trunk from spiceai:trunk#639
Merged
Merged
Conversation
* feat: Return dataset error message in datasets API * fix: Use errors for component status where available * fix build * fix build * fix build * fix build * fix: handle error when converting desired discriminant to u8 in tests * fix(api): align status openapi schema and stabilize datasets CSV --------- Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
* `spidapter stdio --image-tag ... --channel ... ` and standardise spice cloud client (#9436) * Standardise spice cloud client Entire-Checkpoint: 5a6987dbc11e * add image_tag to spidapter Entire-Checkpoint: f5ce118fa528 * timeout errors Entire-Checkpoint: 419914dcd6a9 * add image name to spidapter Entire-Checkpoint: ffdc9b42b0e4 * update_channel Entire-Checkpoint: 1b196b1fe2c7 * fix and lint * fix bad merge * fix build * feat(spidapter): add Iceberg Glue catalog, DDL table creation, and teardown cleanup (#9443) - Generate spicepod with Iceberg Glue catalog (AWS account 211125479522) as the default catalog override (name: spice, access: read_write_create) - Implement create_tables RPC: generate CREATE TABLE IF NOT EXISTS DDL from Arrow schemas and execute via /v1/sql endpoint - Implement teardown cleanup: DROP TABLE IF EXISTS for all created tables before deleting the SCP app - Add Arrow-to-SQL type mapping for DDL generation - Track cname and created table names in RunState for teardown * feat: Add Iceberg delete support via equality delete files * Spidapter: fix iceberg catalog configuration 2026-02-19T15:32:55.128370Z ERROR runtime::init::catalog: Failed to initialize catalog connector: Cannot setup the catalog spice (iceberg) with an invalid configuration. A Catalog Path is required for Iceberg in the format of: http://<host_and_port>/v1/namespaces/<namespace>. For details, visit: https://spiceai.org/docs/components/catalogs/iceberg#from Path must contain 'namespaces' segment * feat: Per-table acceleration via CREATE TABLE ... WITH ("acceleration.*") (#9446) * feat: Per-table acceleration via CREATE TABLE ... WITH ("acceleration.*") Implements per-table acceleration for Iceberg DDL tables using SQL syntax: CREATE TABLE orders (id INT) WITH ( "acceleration.engine" = 'arrow', "acceleration.mode" = 'memory', "acceleration.refresh_check_interval" = '10s' ); Key changes: SQL parsing layer: - acceleration_options.rs: AccelerationOptionsStore + parse_acceleration_options() - preprocess.rs: Strips WITH clause before DataFusion planning, stores options - Keys use "acceleration." prefix (double-quoted for dot in SQL identifiers) Logical/physical plan threading: - IcebergCreateTableNode carries Option<Acceleration> - Analyzer rule retrieves options from shared store - Planner passes Weak<DataFusion> to physical plans AcceleratedTable creation: - create_accelerated_iceberg_table() in physical_plans.rs - Converts spicepod Acceleration → runtime Acceleration via TryFrom - Creates accelerator engine table (Arrow/DuckDB/SQLite) - Builds AcceleratedTable with refresh config - Graceful fallback to raw Iceberg reads on failure Runtime wiring: - SharedDataFusionRef (Arc<OnceLock<Weak<DataFusion>>>) avoids circular deps - Set via DataFusion::set_self_ref() after Arc::new() - Extension planner and physical plans use Weak<DataFusion> Drop cleanup: - IcebergDropTableExec deregisters table first (triggers Drop → abort handlers) - Then drops from Iceberg catalog 13 unit tests for option parsing, preprocessing, and store lifecycle. * feat: Update copyright year and enhance acceleration option handling in DDL * feat: Add Iceberg delete support via equality delete files * feat: Simplify store key assignment in preprocess_create_table_acceleration function * Update spidapter for new system-adapter-protocol (#9442) * `spidapter stdio --image-tag ... --channel ... ` and standardise spice cloud client (#9436) * Standardise spice cloud client Entire-Checkpoint: 5a6987dbc11e * add image_tag to spidapter Entire-Checkpoint: f5ce118fa528 * timeout errors Entire-Checkpoint: 419914dcd6a9 * add image name to spidapter Entire-Checkpoint: ffdc9b42b0e4 * update_channel Entire-Checkpoint: 1b196b1fe2c7 * fix and lint * fix bad merge * fix build * refactor: Clean up formatting and error handling in Iceberg DDL and Spidapter code * refactor: Simplify SQL type mapping for Arrow data types in create_table_ddl function * feat: Per-table acceleration via CREATE TABLE ... WITH ("acceleration.*") Implements per-table acceleration for Iceberg DDL tables using SQL syntax: CREATE TABLE orders (id INT) WITH ( "acceleration.engine" = 'arrow', "acceleration.mode" = 'memory', "acceleration.refresh_check_interval" = '10s' ); Key changes: SQL parsing layer: - acceleration_options.rs: AccelerationOptionsStore + parse_acceleration_options() - preprocess.rs: Strips WITH clause before DataFusion planning, stores options - Keys use "acceleration." prefix (double-quoted for dot in SQL identifiers) Logical/physical plan threading: - IcebergCreateTableNode carries Option<Acceleration> - Analyzer rule retrieves options from shared store - Planner passes Weak<DataFusion> to physical plans AcceleratedTable creation: - create_accelerated_iceberg_table() in physical_plans.rs - Converts spicepod Acceleration → runtime Acceleration via TryFrom - Creates accelerator engine table (Arrow/DuckDB/SQLite) - Builds AcceleratedTable with refresh config - Graceful fallback to raw Iceberg reads on failure Runtime wiring: - SharedDataFusionRef (Arc<OnceLock<Weak<DataFusion>>>) avoids circular deps - Set via DataFusion::set_self_ref() after Arc::new() - Extension planner and physical plans use Weak<DataFusion> Drop cleanup: - IcebergDropTableExec deregisters table first (triggers Drop → abort handlers) - Then drops from Iceberg catalog 13 unit tests for option parsing, preprocessing, and store lifecycle. * refactor: Improve snapshot creation configuration handling for caching mode --------- Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com> Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com> Co-authored-by: Jack Eadie <jack@spice.ai> * Fix cloud client construction and finalize lint-safe test updates * refactor: simplify error handling in CloudClient and optimize SQL type mapping in stdio_server * feat(spidapter): add cayenne acceleration to CREATE TABLE DDL (#9453) Tables created by spidapter now include WITH clause options for per-table acceleration using the cayenne engine with full refresh every 1 second, file mode. * fix: coerce nanosecond timestamps to microsecond for Iceberg v2 (#9454) Iceberg v2 does not support timestamp_ns. DataFusion SQL parser maps TIMESTAMP to Timestamp(Nanosecond, ...) by default. This coerces nanosecond timestamp fields to microsecond precision before converting the Arrow schema to Iceberg schema during CREATE TABLE. * fix: lint fixes for spicebench branch * fix: derive table location from namespace for Iceberg DDL CREATE TABLE (#9455) AWS Glue catalogs require an explicit `location` when creating tables. Previously, `TableCreation` was built without a location, causing Glue to reject the request with "Cannot parse missing string: location". This change: - Fetches namespace properties via `catalog.get_namespace()` and looks for a `location` property to derive `{namespace_location}/{table_name}` - Passes the derived location using `.location_opt()` on the builder - Gracefully falls back to `None` when namespace has no location (e.g. local REST catalogs that auto-generate locations) - Uses `TableReference::bare` for accelerated table dataset names, matching the convention used by the normal dataset acceleration path * spidapter: disable tables acceleration * feat: support dataset.time_column and dataset.time_format in CREATE TABLE WITH options (#9458) Add support for 'dataset.time_column' and 'dataset.time_format' options in CREATE TABLE ... WITH (...) DDL statements. These dataset-level options affect acceleration behavior for append-mode refreshes by specifying which column contains timestamps and what format they use. Example: CREATE TABLE orders (id INT, created_at TIMESTAMP) WITH ( "acceleration.engine" = 'arrow', "acceleration.refresh_mode" = 'append', "dataset.time_column" = 'created_at', "dataset.time_format" = 'timestamp' ) Changes: - Broaden AccelerationOptionsStore to DdlOptionsStore, storing DdlTableOptions which bundles both acceleration and dataset options - Add DatasetOptions struct with time_column and time_format fields - Add parse_dataset_options() and parse_ddl_table_options() functions - Support all TimeFormat variants: timestamp, timestamptz, unix_seconds, unix_millis, ISO8601, date - Thread DatasetOptions through analyzer rule, logical nodes, planner, and physical plans - Wire time_column and time_format into the Refresh config when creating accelerated Iceberg tables - Rename preprocess function and cleanup helper to reflect broader scope - Add comprehensive unit tests for parsing and preprocessing * [Spicebench] Add refresh processed records/bytes metric (#9459) * Add refresh processed records/bytes metric * Fix * Fix lint * refactor: simplify code formatting and remove unnecessary line breaks * fix: update DatasetOptions default value in IcebergDdlAnalyzerRule * fix: update Anthropic model version in AI UDF tests * fix: route accelerated table inserts to federated source When inserting into an accelerated table, writes now go to the federated source instead of dual-writing to both the accelerator and federated source. The acceleration refresh mechanism picks up the new data on its next cycle. The write_to_accelerator_only path (used when on_conflict is configured) is preserved for backward compatibility. * Revert "spidapter: disable tables acceleration #9456" This reverts commit 9a4564d, reversing changes made to 504c76b. * fix: Set replicas to 4 * feat: add ETL sink mode support for Iceberg object store in spidapter * Jeadie/26 02 19/metrics 2 (#9462) * support GET /v1/apps/{}/metrics Entire-Checkpoint: 1450f2e74e01 * fix * fix into spidapter * add ingestion Entire-Checkpoint: fb63eb40697f * make ingestion metrics great again * fix metrics when not exist * fix divide by 0 * update spidapter handler Entire-Checkpoint: 36647632c2b9 * fix spidapter Entire-Checkpoint: e41839760b98 * use correct from * no AWS vars * use spicepod crate in spidapter, enable hive partitioning Entire-Checkpoint: cb023f99de09 * refactor: streamline spicepod imports and set default telemetry configuration * fix: address PR review issues - Fix get_app comparison bug: org == org -> app.org == org - Pin system-adapter-protocol to full SHA (2153680f3c42bd66632fd3f180016c0d5a984d64) - Fix clippy: use then_some instead of then(|| ...) in metrics - Fix clippy: remove unnecessary Result from SetupConfig::from_metadata - Update DatasetConfig in tests for new system-adapter-protocol fields - Format code * fix: handle early completion of write operation in data sink and drain remaining messages * feat: Add Cayenne Catalog with DDL (#9473) * support cayenne catalog with DDL * update * fix: Remove s3 one zone catalogs, force simple cayenne local catalog * feat: Update spidapter to generate with Cayenne Catalog entries (#9475) * chore: fmt * chore: clippy * fix: Disable Cayenne Catalog namespace replication to public and default (#9479) * fix: Disable Cayenne Catalog namespace replication to public and default * revert spidapter * chore: fmt * fix: Preserve schema in results cache for empty query results (#9484) When a query returns 0 rows, the record batch stream may produce zero RecordBatch items. Previously, CachedQueryResult::from_batches() and new_raw() derived the schema from batches[0].schema(), falling back to Schema::empty() when batches was empty — losing the real schema. This caused FlightSQL ADBC clients to fail with 'inconsistent schema' errors because GetFlightInfo reported the correct schema (from the logical plan) while DoGet returned 0 fields (from the cache). Fix: Accept an explicit schema parameter in from_batches() and new_raw() so the correct schema is always preserved, and pass the stream/plan schema from all call sites. Fixes #9481 * feat: spidapter supports setting up Cayenne catalog with ADBC sink (#9488) * fix: spidapter build * feat: Cayenne catalog supports CREATE SCHEMA (#9489) * style: Refactor code formatting for improved readability across multiple files * refactor: Update comments and improve schema references in multiple files * feat: Enable write-only mode for internal accelerated table builder --------- Co-authored-by: Jack Eadie <jack@spice.ai> Co-authored-by: Phillip LeBlanc <phillip@spice.ai> Co-authored-by: Phillip LeBlanc <phillip@leblanc.tech> Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com> Co-authored-by: Viktor Yershov <viktor@spice.ai> Co-authored-by: peasee <98815791+peasee@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )