[pull] trunk from spiceai:trunk by pull[bot] · Pull Request #720 · TheRakeshPurohit/spiceai

pull · 2026-04-06T21:06:17Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

…er rules

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…an (#9927)

Ensure 'SessionStateBuilder::new_from_existing' adds optimizer/analyzer

route updates to executors

Ensure we always have partition expressions

…dapter

…tag-in-spidapter fix: Add spicehq organization tag in spidapter

* chore: Update iceberg-rust fork to v0.9 Update spiceai/iceberg-rust dependency from v0.8.0 to v0.9.0. Key changes for the v0.9 API migration: - Replace Extensions/FileIO::from_path with StorageFactory/FileIOBuilder - Move AwsCredential types from iceberg::io to iceberg-storage-opendal - Replace FileIO.lister() with opendal::Operator for directory listing - Replace RestCatalog::with_file_io_extension with CatalogBuilder::with_storage_factory - Remove storage-gcs/storage-s3 features (now in iceberg-storage-opendal) - Add iceberg-storage-opendal and opendal as workspace dependencies * fix: Address PR review — operator path handling, tests, and lint fixes - Add to_operator_path() helper in HadoopCatalog to convert full URLs to operator-relative paths (fixes S3/GCS path handling) - Set operator root from URL path in build_opendal_operator for S3, GCS, FS - Use streaming lister() instead of eager list() in get_directories and directory_has_metadata for bounded memory usage - Use single stat() instead of exists()+stat() double round-trip in directory_exists and directory_has_metadata_and_data - Fix directory_has_metadata Exact mode to not return false early on first non-matching file; compare metadata filenames correctly - Fix dataconnector to pass base_uri (warehouse root) to build_opendal_operator instead of source (table URL) - Fix test import: AwsCredentialLoad moved to iceberg_storage_opendal - Add unit tests for build_opendal_operator URL parsing - Move use opendal::Configurator to function scope for clippy compliance * chore: Remove accidentally committed .claude worktree * fix: Always provide StorageFactory for iceberg v0.9 catalogs - Add default_storage_factory_from_props() to infer GCS or S3 factory from iceberg properties when no explicit factory is provided - get_rest_catalog() now always passes a StorageFactory to RestCatalogBuilder (required in iceberg-rust v0.9) - Add OpenDalStorageFactory::S3 to Spice Cloud REST catalog builder - Update hadoop catalog test helpers with required storage_factory and opendal Operator for both S3 and local filesystem catalogs - Add iceberg-storage-opendal to data_components dev-dependencies - Add StorageFactory to ignored REST catalog test * fix: Use correct S3 scheme for Glue iceberg catalog StorageFactory In iceberg v0.9, GlueCatalogBuilder defaults to 's3a' as the configured scheme for the StorageFactory. However, AWS Glue metadata locations typically use 's3://' paths, causing a scheme mismatch when reading table metadata ('Invalid s3 url: s3://..., should start with s3a://...'). Fix by deriving the S3 scheme from the actual metadata_location URL and providing an explicit StorageFactory with the correct scheme to the GlueCatalogBuilder. * chore: Remove accidentally committed .claude worktree --------- Co-authored-by: Phillip LeBlanc <phillip@spice.ai>

…writes (#9934) * wip * fix: Improve debug logs * fix: Return hard errors for data correctness issues

* Add full query federation support for ADBC data connector * Update * Update ref

The `schemars` crate does not implement `JsonSchema` for `str` (only `String`), so `Arc<str>` fields fail to compile when the `schemars` feature is enabled (as in the spicepodschema tool). Add `#[schemars(with = "String")]` / `#[schemars(with = "Option<String>")]` attributes to tell schemars to generate the schema as if the fields were plain strings, which is semantically correct. Fixes #2813 Co-authored-by: Claude <claude@spices-MacBook.localdomain> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…N plans (#9956)

- Add `ensure` to snafu imports in write_through.rs to fix compilation error - Fix BTreeMap::from formatting in spidapter to satisfy rustfmt Co-authored-by: Claude <claude@spices-MacBook.localdomain> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Delta Lake: fix data skipping for >= timestamp predicates * Fix auth and add snapshots

…neration (#9967)

Add ephemeral_storage_limit_gb and organization_tag fields to the test_stdio_args() helper that were added to StdioArgs but not the test. Co-authored-by: Claude <claude@spices-MacBook.localdomain> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>

Bumps [aws-config](https://github.com/smithy-lang/smithy-rs) from 1.8.14 to 1.8.15. - [Release notes](https://github.com/smithy-lang/smithy-rs/releases) - [Changelog](https://github.com/smithy-lang/smithy-rs/blob/main/CHANGELOG.md) - [Commits](https://github.com/smithy-lang/smithy-rs/commits) --- updated-dependencies: - dependency-name: aws-config dependency-version: 1.8.15 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…der and transaction management

* fix: return computed credentials in executor Flight service readiness check The credentials() function was assigning the computed value to a discarded variable (_credentials) and unconditionally returning None. This caused executors to always connect anonymously to their own Flight service, failing auth checks, exhausting retries, and aborting startup with 'No executors available to write data to'. Introduced in #9904. * fix: use cluster CA cert in executor Flight service readiness check The executor startup retry loop called can_connect against its own local Flight server, which uses the cluster's private mTLS CA. Passing None for the CA caused TLS verification to fall back to system certificates, which do not include the private cluster CA, so every retry failed with a TLS error and executor startup aborted. Changes: - Store raw CA PEM bytes in ClusterTlsConfig (ca_certificate_pem field) - Add new_tls_flight_channel_with_cert to flight_client::tls for in-memory CA certificates, avoiding a round-trip through the filesystem - Update can_connect to accept Option<tonic::transport::Certificate> and use the new channel builder - Pass the cluster CA cert into can_connect in initialize_cluster_executor - Classify Unauthorized/PermissionDenied retries as permanent failures - Downgrade stray warn! WIP log artifacts to debug! - Log-and-continue instead of aborting on DDL forwarding failures in allocate_initial_partitions RPC (service.rs) * fix formatting * fix: pass full mTLS ClientTlsConfig to can_connect for executor readiness check The executor Flight server requires mutual TLS — clients must present a certificate. Passing only a CA certificate was insufficient; the TLS handshake would fail because no client identity was provided. - Replace ca_certificate: Option<Certificate> param on can_connect with tls_config: Option<ClientTlsConfig>, which carries both the CA cert and the client identity for mTLS - Add new_tls_flight_channel_with_tls_config to tls.rs that accepts a pre-built ClientTlsConfig, falling back to system certs when None - Remove ca_certificate_pem: Vec<u8> from ClusterTlsConfig — the existing client_tls_config field already contains both CA and identity, so the raw PEM bytes were redundant duplication - Remove dead Unauthorized/PermissionDenied permanent-retry branches since can_connect only ever emits UnableToConnectToServer errors * fix formatting * fix: remove unused creds param from can_connect, simplify url allocation - Drop the _creds parameter from can_connect entirely — it was never used and was misleading to callers; the credentials() helper and its imports (api_key_auth, Credentials) are also removed as they became dead code - Simplify Arc<str> construction to a plain String in can_connect, avoiding an unnecessary intermediate &str borrow and extra allocation

… retention tests and schema evolution tests

- Fix race in ClusterHarness::wait_until_executor_count where executors were considered ready based on connection registration but flight SQL clients hadn't been set up yet, causing INSERT forwarding to fail with 'no executors are currently connected' - Make executor_registry pub on DataFusion for test access - Add schema evolution file_update_csv snapshots for cayenne and duckdb

- Increase PAGE_RETRY_MAX_ATTEMPTS from 3 to 5 (total backoff ~19s vs ~6s) for better resilience to transient GitHub 502/504 errors - Improve JsonDecodeError message: for 5xx responses returning HTML (e.g., upstream proxy errors), show a clear message instead of the confusing JSON parse error ('expected value at line 1 column 1') - Rename error field to 'detail' for clarity - Update tests for new retry count and field name

Instead of adding a Connection: close header on the same pooled connection, build a new reqwest::Client with pool_max_idle_per_host(0) when retrying after a gateway error. This ensures the retry goes out on a new TCP connection rather than reusing the (possibly broken) pooled connection.

Co-authored-by: Spice Snapshot Update Bot <spiceaibot@spice.ai> Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>

Add 4 snapshot files for test_schema_evolution_file_update_mode: - initial: 3-column table (id, town, age) with 3 rows - add_column: same query after adding country column - drop_column: 2-column result after dropping age column - change_type: same query after changing age to TEXT

The fresh reqwest::Client built for 502/504 retries was missing the User-Agent header. GitHub requires User-Agent on all requests and returns 403 Forbidden without it, causing retries to fail permanently instead of recovering from the transient gateway error.

Runtime clients: spiceai/{version} CLI clients: spice/{version} - Add User-Agent to all reqwest clients that were missing it - Include CARGO_PKG_VERSION in all user-agent strings - Runtime-side: dataconnector, token_providers, catalog, LLM, model sources, object store, graphql retry client, unity catalog - CLI-side: login commands, github client, context http_client

Format: product/{version} ({os}; {arch}) Runtime: spiceai/2.0.0-unstable (macos; aarch64) CLI: spice/2.0.0-unstable (macos; aarch64) Matches the existing CLI default_user_agent() format.

When WHERE ref = '<branch>' is used, the GraphQL query is rewritten to fetch only that branch's commit history via qualifiedName. All returned commits have the matching ref value, making the filter semantically exact. With Inexact, DataFusion kept a residual FilterExec which prevented LIMIT pushdown into the GraphQL table provider, causing it to paginate through the entire commit history (thousands of API calls with exponential backoff retries on 502s).

Three fixes: 1. Make partition-to-executor assignment deterministic by breaking ties in select_executors by executor ID (lexicographic order). This addresses the HashMap iteration non-determinism. 2. Replace inline plan snapshot for join_two_partitioned_tables with structural assertions that verify plan elements without depending on partition ordering (which varies with dynamic executor port assignment). 3. Fix test_file_update_csv_cayenne checkpoint race: add wait_for_checkpoint after loading Phase 1 so the next phase's file_update schema detection has the previous schema available. Update the add_column snapshot to show the correct 5-column output.

…d readability

- Replace multi_executor inline plan snapshot with structural assertions (partition-to-executor assignment is non-deterministic with 2 executors) - Update cayenne drop_column and no_change_restart snapshots to correct 3-column output (checkpoint-backed schema detection working properly)

@r

Move all 11 inline insta snapshots in distributed_acceleration.rs to named .snap files in the snapshots/ directory. This: - Keeps test files focused on logic, not large string literals - Makes snapshot diffs easier to review in PRs - Follows the project convention established in copilot-instructions.md Also update copilot-instructions.md to document the policy: always use named .snap files, never inline snapshots (@r"..." syntax). Remove two stale named snapshots (distributed_accel_aggregation, distributed_accel_select_all) that are no longer referenced.

* fix: Preserve timestamp timezone in DDL forwarding to executors arrow_datatype_to_sql mapped all Timestamp variants to bare "TIMESTAMP", discarding timezone information. When DataFusion parses "TIMESTAMP" on the executor side it produces Timestamp(Nanosecond, None), losing the timezone. This causes schema mismatches between scheduler and executor for timezone-aware columns (e.g. Cayenne retention columns using Timestamp(Microsecond, Some("UTC"))), which can produce incorrect results for timezone-sensitive comparisons and joins. Use TIMESTAMPTZ for timezone-aware timestamps so the executor creates columns with Timestamp(Nanosecond, Some("+00:00")), preserving timezone awareness across the DDL forwarding boundary. * style: cargo fmt * test: add DDL roundtrip test for timestamp timezone preservation * fix: Update field_with_name calls to match DataFusion API The DFSchema::field_with_name method now requires a TableReference as the first argument. Pass None for unqualified field lookups. * fix: Hold table binding to satisfy borrow checker The DFSchema::field_with_name now returns a reference that borrows from the schema owner. Split the chained call to keep the table alive for the borrow. * fix: Use PostgreSQL dialect in DDL round-trip test The runtime configures DataFusion with PostgreSQL dialect, which correctly parses TIMESTAMPTZ as a timezone-aware timestamp. The test was using the default dialect which doesn't recognize TIMESTAMPTZ. * fix: Use standard SQL TIMESTAMP WITH TIME ZONE in DDL DataFusion's SQL parser (even in PostgreSQL mode) doesn't parse the TIMESTAMPTZ shorthand as timezone-aware. Use the standard SQL form TIMESTAMP WITH TIME ZONE which DataFusion correctly recognizes, ensuring DDL forwarding to executors preserves timezone information. * style: cargo fmt * fix: Remove DDL round-trip test DataFusion's CREATE TABLE DDL planner does not preserve timezone annotations for timestamp columns (TIMESTAMP WITH TIME ZONE is parsed as Timestamp(Nanosecond, None)). This is a DataFusion limitation, not a bug in this PR. The existing unit tests (timestamp_without_tz_maps_to_timestamp and timestamp_with_tz_maps_to_timestamptz) verify the correct SQL string mapping. Cayenne DDL is forwarded as Arrow schemas, not round-tripped through DataFusion's SQL parser, so the string mapping is what matters. * style: remove trailing blank line --------- Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>

Merge develop to trunk (2026-04-03)

…-error-handling fix: propagate JoinError from partition insert tasks

Jeadie and others added 30 commits March 23, 2026 16:30

ensure we always have partition expressions

38e34eb

fix func

a7f2fab

fix

593efe7

clippy

f8aea67

route updates to executors

d9dd480

Ensure 'SessionStateBuilder::new_from_existing' adds optimizer/analyz…

b9c566e

…er rules

Apply suggestion

701858d

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

clippy

b5cac2c

Run physical optimizer on FallbackOnZeroResultsScanExec fallback pl…

5ff62b3

…an (#9927)

Merge branch 'trunk' into jeadie/26-03-23/blank-parts

d86ff6d

Merge branch 'develop' into jeadie/26-03-24/update-execs

fb86470

Merge pull request #9921 from spiceai/jeadie/26-03-25/new_from_existing2

d8fc7d0

Ensure 'SessionStateBuilder::new_from_existing' adds optimizer/analyzer

Merge branch 'develop' into jeadie/26-03-24/update-execs

9fcea00

Merge pull request #9907 from spiceai/jeadie/26-03-24/update-execs

c24fbba

route updates to executors

Merge pull request #9890 from spiceai/jeadie/26-03-23/blank-parts

b873386

Ensure we always have partition expressions

fix: Add spicehq organization tag in spidapter

86f58db

Merge branch 'develop' into peasee/260326-add-organization-tag-in-spi…

31b07db

…dapter

fix: Support specifying ephemeral storage limit

0a7803a

Merge pull request #9946 from spiceai/peasee/260326-add-organization-…

89f262e

…tag-in-spidapter fix: Add spicehq organization tag in spidapter

fix: Improve error handling to avoid silencing errors on distributed …

b1acd09

…writes (#9934) * wip * fix: Improve debug logs * fix: Return hard errors for data correctness issues

Add full query federation support for ADBC data connector (#9953)

15629b2

* Add full query federation support for ADBC data connector * Update * Update ref

Add ADBC BigQuery to benchmark tests (#9960)

3da4314

Hash ADBC join-pushdown context to prevent credential leaks in EXPLAI…

eb8bb84

…N plans (#9956)

Delta Lake: fix data skipping for >= timestamp predicates (#9932)

a3d580f

* Delta Lake: fix data skipping for >= timestamp predicates * Fix auth and add snapshots

ADBC BigQuery: Improve BigQuery dialect date/time and interval SQL ge…

2a25c39

…neration (#9967)

lukekim and others added 27 commits April 4, 2026 09:20

fix: Enhance error handling and improve code clarity in catalog provi…

984baf2

…der and transaction management

refactor: Simplify conditionals and improve readability in file-based…

293e3d9

… retention tests and schema evolution tests

test: Add schema evolution file_update_csv snapshots for sqlite

02035d7

fix: Simplify display formatting for JsonDecodeError in error handling

9e5e019

fix: Update test snapshots (#10060)

64cbbf6

Co-authored-by: Spice Snapshot Update Bot <spiceaibot@spice.ai> Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>

Add *.duckdb and *.duckdb.wal to .gitignore

89b73e6

Include OS and arch in all User-Agent strings

42608ec

Format: product/{version} ({os}; {arch}) Runtime: spiceai/2.0.0-unstable (macos; aarch64) CLI: spice/2.0.0-unstable (macos; aarch64) Matches the existing CLI default_user_agent() format.

Refactor User-Agent handling to use a centralized utility function

b3c07f7

Refactor assertions in distributed acceleration join test for improve…

584afc3

…d readability

Refactor assertions in distributed acceleration join test for improve…

739078f

…d readability

fix lint

1eb2790

Merge pull request #10051 from spiceai/develop

a0602fa

Merge develop to trunk (2026-04-03)

fix: propagate JoinError from partition insert tasks

e7562f0

tweak error messages

57cc9c7

Merge pull request #10179 from spiceai/evgenii/2026-04-06/join-handle…

77f17e0

…-error-handling fix: propagate JoinError from partition insert tasks

pull Bot locked and limited conversation to collaborators Apr 6, 2026

pull Bot added the ⤵️ pull label Apr 6, 2026

pull Bot merged commit 77f17e0 into TheRakeshPurohit:trunk Apr 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] trunk from spiceai:trunk#720

[pull] trunk from spiceai:trunk#720
pull[bot] merged 248 commits into
TheRakeshPurohit:trunkfrom
spiceai:trunk

pull Bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

pull Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

pull Bot commented Apr 6, 2026 •

edited

Loading