Skip to content

[pull] trunk from spiceai:trunk#720

Merged
pull[bot] merged 248 commits into
TheRakeshPurohit:trunkfrom
spiceai:trunk
Apr 6, 2026
Merged

[pull] trunk from spiceai:trunk#720
pull[bot] merged 248 commits into
TheRakeshPurohit:trunkfrom
spiceai:trunk

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented Apr 6, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

Jeadie and others added 30 commits March 23, 2026 16:30
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Ensure 'SessionStateBuilder::new_from_existing' adds optimizer/analyzer
Ensure we always have partition expressions
…tag-in-spidapter

fix: Add spicehq organization tag in spidapter
* chore: Update iceberg-rust fork to v0.9

Update spiceai/iceberg-rust dependency from v0.8.0 to v0.9.0.

Key changes for the v0.9 API migration:
- Replace Extensions/FileIO::from_path with StorageFactory/FileIOBuilder
- Move AwsCredential types from iceberg::io to iceberg-storage-opendal
- Replace FileIO.lister() with opendal::Operator for directory listing
- Replace RestCatalog::with_file_io_extension with CatalogBuilder::with_storage_factory
- Remove storage-gcs/storage-s3 features (now in iceberg-storage-opendal)
- Add iceberg-storage-opendal and opendal as workspace dependencies

* fix: Address PR review — operator path handling, tests, and lint fixes

- Add to_operator_path() helper in HadoopCatalog to convert full URLs to
  operator-relative paths (fixes S3/GCS path handling)
- Set operator root from URL path in build_opendal_operator for S3, GCS, FS
- Use streaming lister() instead of eager list() in get_directories and
  directory_has_metadata for bounded memory usage
- Use single stat() instead of exists()+stat() double round-trip in
  directory_exists and directory_has_metadata_and_data
- Fix directory_has_metadata Exact mode to not return false early on first
  non-matching file; compare metadata filenames correctly
- Fix dataconnector to pass base_uri (warehouse root) to build_opendal_operator
  instead of source (table URL)
- Fix test import: AwsCredentialLoad moved to iceberg_storage_opendal
- Add unit tests for build_opendal_operator URL parsing
- Move use opendal::Configurator to function scope for clippy compliance

* chore: Remove accidentally committed .claude worktree

* fix: Always provide StorageFactory for iceberg v0.9 catalogs

- Add default_storage_factory_from_props() to infer GCS or S3 factory from
  iceberg properties when no explicit factory is provided
- get_rest_catalog() now always passes a StorageFactory to RestCatalogBuilder
  (required in iceberg-rust v0.9)
- Add OpenDalStorageFactory::S3 to Spice Cloud REST catalog builder
- Update hadoop catalog test helpers with required storage_factory and
  opendal Operator for both S3 and local filesystem catalogs
- Add iceberg-storage-opendal to data_components dev-dependencies
- Add StorageFactory to ignored REST catalog test

* fix: Use correct S3 scheme for Glue iceberg catalog StorageFactory

In iceberg v0.9, GlueCatalogBuilder defaults to 's3a' as the configured
scheme for the StorageFactory. However, AWS Glue metadata locations
typically use 's3://' paths, causing a scheme mismatch when reading
table metadata ('Invalid s3 url: s3://..., should start with s3a://...').

Fix by deriving the S3 scheme from the actual metadata_location URL and
providing an explicit StorageFactory with the correct scheme to the
GlueCatalogBuilder.

* chore: Remove accidentally committed .claude worktree

---------

Co-authored-by: Phillip LeBlanc <phillip@spice.ai>
…writes (#9934)

* wip

* fix: Improve debug logs

* fix: Return hard errors for data correctness issues
* Add full query federation support for ADBC data connector

* Update

* Update ref
The `schemars` crate does not implement `JsonSchema` for `str` (only
`String`), so `Arc<str>` fields fail to compile when the `schemars`
feature is enabled (as in the spicepodschema tool). Add
`#[schemars(with = "String")]` / `#[schemars(with = "Option<String>")]`
attributes to tell schemars to generate the schema as if the fields were
plain strings, which is semantically correct.

Fixes #2813

Co-authored-by: Claude <claude@spices-MacBook.localdomain>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add `ensure` to snafu imports in write_through.rs to fix compilation error
- Fix BTreeMap::from formatting in spidapter to satisfy rustfmt

Co-authored-by: Claude <claude@spices-MacBook.localdomain>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Delta Lake: fix data skipping for >= timestamp predicates

* Fix auth and add snapshots
Add ephemeral_storage_limit_gb and organization_tag fields to the
test_stdio_args() helper that were added to StdioArgs but not the test.

Co-authored-by: Claude <claude@spices-MacBook.localdomain>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
Bumps [aws-config](https://github.com/smithy-lang/smithy-rs) from 1.8.14 to 1.8.15.
- [Release notes](https://github.com/smithy-lang/smithy-rs/releases)
- [Changelog](https://github.com/smithy-lang/smithy-rs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/smithy-lang/smithy-rs/commits)

---
updated-dependencies:
- dependency-name: aws-config
  dependency-version: 1.8.15
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
lukekim and others added 27 commits April 4, 2026 09:20
* fix: return computed credentials in executor Flight service readiness check

The credentials() function was assigning the computed value to a
discarded variable (_credentials) and unconditionally returning None.
This caused executors to always connect anonymously to their own Flight
service, failing auth checks, exhausting retries, and aborting startup
with 'No executors available to write data to'.

Introduced in #9904.

* fix: use cluster CA cert in executor Flight service readiness check

The executor startup retry loop called can_connect against its own local
Flight server, which uses the cluster's private mTLS CA. Passing None for
the CA caused TLS verification to fall back to system certificates, which
do not include the private cluster CA, so every retry failed with a TLS
error and executor startup aborted.

Changes:
- Store raw CA PEM bytes in ClusterTlsConfig (ca_certificate_pem field)
- Add new_tls_flight_channel_with_cert to flight_client::tls for in-memory
  CA certificates, avoiding a round-trip through the filesystem
- Update can_connect to accept Option<tonic::transport::Certificate> and
  use the new channel builder
- Pass the cluster CA cert into can_connect in initialize_cluster_executor
- Classify Unauthorized/PermissionDenied retries as permanent failures
- Downgrade stray warn! WIP log artifacts to debug!
- Log-and-continue instead of aborting on DDL forwarding failures in
  allocate_initial_partitions RPC (service.rs)

* fix formatting

* fix: pass full mTLS ClientTlsConfig to can_connect for executor readiness check

The executor Flight server requires mutual TLS — clients must present a
certificate. Passing only a CA certificate was insufficient; the TLS
handshake would fail because no client identity was provided.

- Replace ca_certificate: Option<Certificate> param on can_connect with
  tls_config: Option<ClientTlsConfig>, which carries both the CA cert and
  the client identity for mTLS
- Add new_tls_flight_channel_with_tls_config to tls.rs that accepts a
  pre-built ClientTlsConfig, falling back to system certs when None
- Remove ca_certificate_pem: Vec<u8> from ClusterTlsConfig — the existing
  client_tls_config field already contains both CA and identity, so the
  raw PEM bytes were redundant duplication
- Remove dead Unauthorized/PermissionDenied permanent-retry branches since
  can_connect only ever emits UnableToConnectToServer errors

* fix formatting

* fix: remove unused creds param from can_connect, simplify url allocation

- Drop the _creds parameter from can_connect entirely — it was never used
  and was misleading to callers; the credentials() helper and its imports
  (api_key_auth, Credentials) are also removed as they became dead code
- Simplify Arc<str> construction to a plain String in can_connect, avoiding
  an unnecessary intermediate &str borrow and extra allocation
- Fix race in ClusterHarness::wait_until_executor_count where executors
  were considered ready based on connection registration but flight SQL
  clients hadn't been set up yet, causing INSERT forwarding to fail with
  'no executors are currently connected'
- Make executor_registry pub on DataFusion for test access
- Add schema evolution file_update_csv snapshots for cayenne and duckdb
- Increase PAGE_RETRY_MAX_ATTEMPTS from 3 to 5 (total backoff ~19s vs ~6s)
  for better resilience to transient GitHub 502/504 errors
- Improve JsonDecodeError message: for 5xx responses returning HTML (e.g.,
  upstream proxy errors), show a clear message instead of the confusing
  JSON parse error ('expected value at line 1 column 1')
- Rename error field to 'detail' for clarity
- Update tests for new retry count and field name
Instead of adding a Connection: close header on the same pooled
connection, build a new reqwest::Client with pool_max_idle_per_host(0)
when retrying after a gateway error. This ensures the retry goes out
on a new TCP connection rather than reusing the (possibly broken)
pooled connection.
Co-authored-by: Spice Snapshot Update Bot <spiceaibot@spice.ai>
Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
Add 4 snapshot files for test_schema_evolution_file_update_mode:
- initial: 3-column table (id, town, age) with 3 rows
- add_column: same query after adding country column
- drop_column: 2-column result after dropping age column
- change_type: same query after changing age to TEXT
The fresh reqwest::Client built for 502/504 retries was missing the
User-Agent header. GitHub requires User-Agent on all requests and
returns 403 Forbidden without it, causing retries to fail permanently
instead of recovering from the transient gateway error.
Runtime clients: spiceai/{version}
CLI clients: spice/{version}

- Add User-Agent to all reqwest clients that were missing it
- Include CARGO_PKG_VERSION in all user-agent strings
- Runtime-side: dataconnector, token_providers, catalog, LLM, model
  sources, object store, graphql retry client, unity catalog
- CLI-side: login commands, github client, context http_client
Format: product/{version} ({os}; {arch})
Runtime: spiceai/2.0.0-unstable (macos; aarch64)
CLI: spice/2.0.0-unstable (macos; aarch64)

Matches the existing CLI default_user_agent() format.
When WHERE ref = '<branch>' is used, the GraphQL query is rewritten to
fetch only that branch's commit history via qualifiedName. All returned
commits have the matching ref value, making the filter semantically
exact. With Inexact, DataFusion kept a residual FilterExec which
prevented LIMIT pushdown into the GraphQL table provider, causing it
to paginate through the entire commit history (thousands of API calls
with exponential backoff retries on 502s).
Three fixes:

1. Make partition-to-executor assignment deterministic by breaking ties
   in select_executors by executor ID (lexicographic order). This
   addresses the HashMap iteration non-determinism.

2. Replace inline plan snapshot for join_two_partitioned_tables with
   structural assertions that verify plan elements without depending on
   partition ordering (which varies with dynamic executor port assignment).

3. Fix test_file_update_csv_cayenne checkpoint race: add
   wait_for_checkpoint after loading Phase 1 so the next phase's
   file_update schema detection has the previous schema available.
   Update the add_column snapshot to show the correct 5-column output.
- Replace multi_executor inline plan snapshot with structural assertions
  (partition-to-executor assignment is non-deterministic with 2 executors)
- Update cayenne drop_column and no_change_restart snapshots to correct
  3-column output (checkpoint-backed schema detection working properly)
Move all 11 inline insta snapshots in distributed_acceleration.rs to
named .snap files in the snapshots/ directory. This:
- Keeps test files focused on logic, not large string literals
- Makes snapshot diffs easier to review in PRs
- Follows the project convention established in copilot-instructions.md

Also update copilot-instructions.md to document the policy:
always use named .snap files, never inline snapshots (@r"..." syntax).

Remove two stale named snapshots (distributed_accel_aggregation,
distributed_accel_select_all) that are no longer referenced.
* fix: Preserve timestamp timezone in DDL forwarding to executors

arrow_datatype_to_sql mapped all Timestamp variants to bare
"TIMESTAMP", discarding timezone information. When DataFusion parses
"TIMESTAMP" on the executor side it produces Timestamp(Nanosecond,
None), losing the timezone. This causes schema mismatches between
scheduler and executor for timezone-aware columns (e.g. Cayenne
retention columns using Timestamp(Microsecond, Some("UTC"))), which
can produce incorrect results for timezone-sensitive comparisons and
joins.

Use TIMESTAMPTZ for timezone-aware timestamps so the executor creates
columns with Timestamp(Nanosecond, Some("+00:00")), preserving
timezone awareness across the DDL forwarding boundary.

* style: cargo fmt

* test: add DDL roundtrip test for timestamp timezone preservation

* fix: Update field_with_name calls to match DataFusion API

The DFSchema::field_with_name method now requires a TableReference
as the first argument. Pass None for unqualified field lookups.

* fix: Hold table binding to satisfy borrow checker

The DFSchema::field_with_name now returns a reference that borrows
from the schema owner. Split the chained call to keep the table
alive for the borrow.

* fix: Use PostgreSQL dialect in DDL round-trip test

The runtime configures DataFusion with PostgreSQL dialect, which
correctly parses TIMESTAMPTZ as a timezone-aware timestamp. The test
was using the default dialect which doesn't recognize TIMESTAMPTZ.

* fix: Use standard SQL TIMESTAMP WITH TIME ZONE in DDL

DataFusion's SQL parser (even in PostgreSQL mode) doesn't parse the
TIMESTAMPTZ shorthand as timezone-aware. Use the standard SQL form
TIMESTAMP WITH TIME ZONE which DataFusion correctly recognizes,
ensuring DDL forwarding to executors preserves timezone information.

* style: cargo fmt

* fix: Remove DDL round-trip test

DataFusion's CREATE TABLE DDL planner does not preserve timezone
annotations for timestamp columns (TIMESTAMP WITH TIME ZONE is
parsed as Timestamp(Nanosecond, None)). This is a DataFusion
limitation, not a bug in this PR.

The existing unit tests (timestamp_without_tz_maps_to_timestamp and
timestamp_with_tz_maps_to_timestamptz) verify the correct SQL string
mapping. Cayenne DDL is forwarded as Arrow schemas, not round-tripped
through DataFusion's SQL parser, so the string mapping is what matters.

* style: remove trailing blank line

---------

Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com>
Merge develop to trunk (2026-04-03)
…-error-handling

fix: propagate JoinError from partition insert tasks
@pull pull Bot locked and limited conversation to collaborators Apr 6, 2026
@pull pull Bot added the ⤵️ pull label Apr 6, 2026
@pull pull Bot merged commit 77f17e0 into TheRakeshPurohit:trunk Apr 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants