[pull] trunk from spiceai:trunk#720
Merged
Merged
Conversation
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Ensure 'SessionStateBuilder::new_from_existing' adds optimizer/analyzer
route updates to executors
Ensure we always have partition expressions
…tag-in-spidapter fix: Add spicehq organization tag in spidapter
* chore: Update iceberg-rust fork to v0.9
Update spiceai/iceberg-rust dependency from v0.8.0 to v0.9.0.
Key changes for the v0.9 API migration:
- Replace Extensions/FileIO::from_path with StorageFactory/FileIOBuilder
- Move AwsCredential types from iceberg::io to iceberg-storage-opendal
- Replace FileIO.lister() with opendal::Operator for directory listing
- Replace RestCatalog::with_file_io_extension with CatalogBuilder::with_storage_factory
- Remove storage-gcs/storage-s3 features (now in iceberg-storage-opendal)
- Add iceberg-storage-opendal and opendal as workspace dependencies
* fix: Address PR review — operator path handling, tests, and lint fixes
- Add to_operator_path() helper in HadoopCatalog to convert full URLs to
operator-relative paths (fixes S3/GCS path handling)
- Set operator root from URL path in build_opendal_operator for S3, GCS, FS
- Use streaming lister() instead of eager list() in get_directories and
directory_has_metadata for bounded memory usage
- Use single stat() instead of exists()+stat() double round-trip in
directory_exists and directory_has_metadata_and_data
- Fix directory_has_metadata Exact mode to not return false early on first
non-matching file; compare metadata filenames correctly
- Fix dataconnector to pass base_uri (warehouse root) to build_opendal_operator
instead of source (table URL)
- Fix test import: AwsCredentialLoad moved to iceberg_storage_opendal
- Add unit tests for build_opendal_operator URL parsing
- Move use opendal::Configurator to function scope for clippy compliance
* chore: Remove accidentally committed .claude worktree
* fix: Always provide StorageFactory for iceberg v0.9 catalogs
- Add default_storage_factory_from_props() to infer GCS or S3 factory from
iceberg properties when no explicit factory is provided
- get_rest_catalog() now always passes a StorageFactory to RestCatalogBuilder
(required in iceberg-rust v0.9)
- Add OpenDalStorageFactory::S3 to Spice Cloud REST catalog builder
- Update hadoop catalog test helpers with required storage_factory and
opendal Operator for both S3 and local filesystem catalogs
- Add iceberg-storage-opendal to data_components dev-dependencies
- Add StorageFactory to ignored REST catalog test
* fix: Use correct S3 scheme for Glue iceberg catalog StorageFactory
In iceberg v0.9, GlueCatalogBuilder defaults to 's3a' as the configured
scheme for the StorageFactory. However, AWS Glue metadata locations
typically use 's3://' paths, causing a scheme mismatch when reading
table metadata ('Invalid s3 url: s3://..., should start with s3a://...').
Fix by deriving the S3 scheme from the actual metadata_location URL and
providing an explicit StorageFactory with the correct scheme to the
GlueCatalogBuilder.
* chore: Remove accidentally committed .claude worktree
---------
Co-authored-by: Phillip LeBlanc <phillip@spice.ai>
…writes (#9934) * wip * fix: Improve debug logs * fix: Return hard errors for data correctness issues
* Add full query federation support for ADBC data connector * Update * Update ref
The `schemars` crate does not implement `JsonSchema` for `str` (only `String`), so `Arc<str>` fields fail to compile when the `schemars` feature is enabled (as in the spicepodschema tool). Add `#[schemars(with = "String")]` / `#[schemars(with = "Option<String>")]` attributes to tell schemars to generate the schema as if the fields were plain strings, which is semantically correct. Fixes #2813 Co-authored-by: Claude <claude@spices-MacBook.localdomain> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add `ensure` to snafu imports in write_through.rs to fix compilation error - Fix BTreeMap::from formatting in spidapter to satisfy rustfmt Co-authored-by: Claude <claude@spices-MacBook.localdomain> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Delta Lake: fix data skipping for >= timestamp predicates * Fix auth and add snapshots
Add ephemeral_storage_limit_gb and organization_tag fields to the test_stdio_args() helper that were added to StdioArgs but not the test. Co-authored-by: Claude <claude@spices-MacBook.localdomain> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
Bumps [aws-config](https://github.com/smithy-lang/smithy-rs) from 1.8.14 to 1.8.15. - [Release notes](https://github.com/smithy-lang/smithy-rs/releases) - [Changelog](https://github.com/smithy-lang/smithy-rs/blob/main/CHANGELOG.md) - [Commits](https://github.com/smithy-lang/smithy-rs/commits) --- updated-dependencies: - dependency-name: aws-config dependency-version: 1.8.15 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…der and transaction management
* fix: return computed credentials in executor Flight service readiness check The credentials() function was assigning the computed value to a discarded variable (_credentials) and unconditionally returning None. This caused executors to always connect anonymously to their own Flight service, failing auth checks, exhausting retries, and aborting startup with 'No executors available to write data to'. Introduced in #9904. * fix: use cluster CA cert in executor Flight service readiness check The executor startup retry loop called can_connect against its own local Flight server, which uses the cluster's private mTLS CA. Passing None for the CA caused TLS verification to fall back to system certificates, which do not include the private cluster CA, so every retry failed with a TLS error and executor startup aborted. Changes: - Store raw CA PEM bytes in ClusterTlsConfig (ca_certificate_pem field) - Add new_tls_flight_channel_with_cert to flight_client::tls for in-memory CA certificates, avoiding a round-trip through the filesystem - Update can_connect to accept Option<tonic::transport::Certificate> and use the new channel builder - Pass the cluster CA cert into can_connect in initialize_cluster_executor - Classify Unauthorized/PermissionDenied retries as permanent failures - Downgrade stray warn! WIP log artifacts to debug! - Log-and-continue instead of aborting on DDL forwarding failures in allocate_initial_partitions RPC (service.rs) * fix formatting * fix: pass full mTLS ClientTlsConfig to can_connect for executor readiness check The executor Flight server requires mutual TLS — clients must present a certificate. Passing only a CA certificate was insufficient; the TLS handshake would fail because no client identity was provided. - Replace ca_certificate: Option<Certificate> param on can_connect with tls_config: Option<ClientTlsConfig>, which carries both the CA cert and the client identity for mTLS - Add new_tls_flight_channel_with_tls_config to tls.rs that accepts a pre-built ClientTlsConfig, falling back to system certs when None - Remove ca_certificate_pem: Vec<u8> from ClusterTlsConfig — the existing client_tls_config field already contains both CA and identity, so the raw PEM bytes were redundant duplication - Remove dead Unauthorized/PermissionDenied permanent-retry branches since can_connect only ever emits UnableToConnectToServer errors * fix formatting * fix: remove unused creds param from can_connect, simplify url allocation - Drop the _creds parameter from can_connect entirely — it was never used and was misleading to callers; the credentials() helper and its imports (api_key_auth, Credentials) are also removed as they became dead code - Simplify Arc<str> construction to a plain String in can_connect, avoiding an unnecessary intermediate &str borrow and extra allocation
… retention tests and schema evolution tests
- Fix race in ClusterHarness::wait_until_executor_count where executors were considered ready based on connection registration but flight SQL clients hadn't been set up yet, causing INSERT forwarding to fail with 'no executors are currently connected' - Make executor_registry pub on DataFusion for test access - Add schema evolution file_update_csv snapshots for cayenne and duckdb
- Increase PAGE_RETRY_MAX_ATTEMPTS from 3 to 5 (total backoff ~19s vs ~6s)
for better resilience to transient GitHub 502/504 errors
- Improve JsonDecodeError message: for 5xx responses returning HTML (e.g.,
upstream proxy errors), show a clear message instead of the confusing
JSON parse error ('expected value at line 1 column 1')
- Rename error field to 'detail' for clarity
- Update tests for new retry count and field name
Instead of adding a Connection: close header on the same pooled connection, build a new reqwest::Client with pool_max_idle_per_host(0) when retrying after a gateway error. This ensures the retry goes out on a new TCP connection rather than reusing the (possibly broken) pooled connection.
Co-authored-by: Spice Snapshot Update Bot <spiceaibot@spice.ai> Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
Add 4 snapshot files for test_schema_evolution_file_update_mode: - initial: 3-column table (id, town, age) with 3 rows - add_column: same query after adding country column - drop_column: 2-column result after dropping age column - change_type: same query after changing age to TEXT
The fresh reqwest::Client built for 502/504 retries was missing the User-Agent header. GitHub requires User-Agent on all requests and returns 403 Forbidden without it, causing retries to fail permanently instead of recovering from the transient gateway error.
Runtime clients: spiceai/{version}
CLI clients: spice/{version}
- Add User-Agent to all reqwest clients that were missing it
- Include CARGO_PKG_VERSION in all user-agent strings
- Runtime-side: dataconnector, token_providers, catalog, LLM, model
sources, object store, graphql retry client, unity catalog
- CLI-side: login commands, github client, context http_client
Format: product/{version} ({os}; {arch})
Runtime: spiceai/2.0.0-unstable (macos; aarch64)
CLI: spice/2.0.0-unstable (macos; aarch64)
Matches the existing CLI default_user_agent() format.
When WHERE ref = '<branch>' is used, the GraphQL query is rewritten to fetch only that branch's commit history via qualifiedName. All returned commits have the matching ref value, making the filter semantically exact. With Inexact, DataFusion kept a residual FilterExec which prevented LIMIT pushdown into the GraphQL table provider, causing it to paginate through the entire commit history (thousands of API calls with exponential backoff retries on 502s).
Three fixes: 1. Make partition-to-executor assignment deterministic by breaking ties in select_executors by executor ID (lexicographic order). This addresses the HashMap iteration non-determinism. 2. Replace inline plan snapshot for join_two_partitioned_tables with structural assertions that verify plan elements without depending on partition ordering (which varies with dynamic executor port assignment). 3. Fix test_file_update_csv_cayenne checkpoint race: add wait_for_checkpoint after loading Phase 1 so the next phase's file_update schema detection has the previous schema available. Update the add_column snapshot to show the correct 5-column output.
- Replace multi_executor inline plan snapshot with structural assertions (partition-to-executor assignment is non-deterministic with 2 executors) - Update cayenne drop_column and no_change_restart snapshots to correct 3-column output (checkpoint-backed schema detection working properly)
Move all 11 inline insta snapshots in distributed_acceleration.rs to named .snap files in the snapshots/ directory. This: - Keeps test files focused on logic, not large string literals - Makes snapshot diffs easier to review in PRs - Follows the project convention established in copilot-instructions.md Also update copilot-instructions.md to document the policy: always use named .snap files, never inline snapshots (@r"..." syntax). Remove two stale named snapshots (distributed_accel_aggregation, distributed_accel_select_all) that are no longer referenced.
* fix: Preserve timestamp timezone in DDL forwarding to executors
arrow_datatype_to_sql mapped all Timestamp variants to bare
"TIMESTAMP", discarding timezone information. When DataFusion parses
"TIMESTAMP" on the executor side it produces Timestamp(Nanosecond,
None), losing the timezone. This causes schema mismatches between
scheduler and executor for timezone-aware columns (e.g. Cayenne
retention columns using Timestamp(Microsecond, Some("UTC"))), which
can produce incorrect results for timezone-sensitive comparisons and
joins.
Use TIMESTAMPTZ for timezone-aware timestamps so the executor creates
columns with Timestamp(Nanosecond, Some("+00:00")), preserving
timezone awareness across the DDL forwarding boundary.
* style: cargo fmt
* test: add DDL roundtrip test for timestamp timezone preservation
* fix: Update field_with_name calls to match DataFusion API
The DFSchema::field_with_name method now requires a TableReference
as the first argument. Pass None for unqualified field lookups.
* fix: Hold table binding to satisfy borrow checker
The DFSchema::field_with_name now returns a reference that borrows
from the schema owner. Split the chained call to keep the table
alive for the borrow.
* fix: Use PostgreSQL dialect in DDL round-trip test
The runtime configures DataFusion with PostgreSQL dialect, which
correctly parses TIMESTAMPTZ as a timezone-aware timestamp. The test
was using the default dialect which doesn't recognize TIMESTAMPTZ.
* fix: Use standard SQL TIMESTAMP WITH TIME ZONE in DDL
DataFusion's SQL parser (even in PostgreSQL mode) doesn't parse the
TIMESTAMPTZ shorthand as timezone-aware. Use the standard SQL form
TIMESTAMP WITH TIME ZONE which DataFusion correctly recognizes,
ensuring DDL forwarding to executors preserves timezone information.
* style: cargo fmt
* fix: Remove DDL round-trip test
DataFusion's CREATE TABLE DDL planner does not preserve timezone
annotations for timestamp columns (TIMESTAMP WITH TIME ZONE is
parsed as Timestamp(Nanosecond, None)). This is a DataFusion
limitation, not a bug in this PR.
The existing unit tests (timestamp_without_tz_maps_to_timestamp and
timestamp_with_tz_maps_to_timestamptz) verify the correct SQL string
mapping. Cayenne DDL is forwarded as Arrow schemas, not round-tripped
through DataFusion's SQL parser, so the string mapping is what matters.
* style: remove trailing blank line
---------
Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
Merge develop to trunk (2026-04-03)
…-error-handling fix: propagate JoinError from partition insert tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )