[pull] trunk from spiceai:trunk#777
Merged
Merged
Conversation
…10528) * fix: deny unsupported functions from filter pushdown in UseSource mode In ZeroResultsAction::UseSource, supports_filters_pushdown was returning Inexact for all filters unconditionally, bypassing the deny_spice_specific_functions() check. This caused json_get_str (and other Spice/JSON UDFs) to be pushed down to DuckDB as raw SQL predicates, which DuckDB cannot execute. Apply the same deny-list check as ReturnEmpty: filters containing unsupported functions return Unsupported so DataFusion evaluates them locally after the scan, while plain filters continue to be pushed down to the accelerator. * fix: split filters in scan() for UseSource mode instead of marking as Unsupported Keep supports_filters_pushdown returning Inexact for all filters in UseSource mode so FallbackOnZeroResultsScanExec receives the full predicate set for correct fallback decisions. Inside scan(), partition filters using deny_spice_specific_functions() and apply denied filters locally via wrap_with_filter before the zero-results check, rather than pushing them into the accelerator SQL. * fix: delegate update and delete_from in IndexedTableProvider and EmbeddingTable (#10520) * fix: delegate update and delete_from in IndexedTableProvider and EmbeddingTable Both wrappers delegated insert_into to the underlying provider but omitted update and delete_from, causing those operations to fall through to DataFusion's default "not supported" error even when the underlying table (e.g. DuckDB, Cayenne) supports them. Since these wrappers form the outermost layer of the provider chain (IndexedTableProvider → EmbeddingTable → AcceleratedTable), the missing delegations blocked UPDATE and DELETE for all indexed or embedding-augmented tables. * style: apply cargo fmt * feat(devx): make config errors, CLI, and REPL lead users to success (#10489) * feat(devx): make config errors, CLI, and REPL lead users to success Plumbs existing infrastructure (ParameterSpec docs fields, util::levenshtein, the generated spicepod JSON schema) into the user-facing surfaces so a typo in a param, a missing secret reference, or an unknown engine produces an actionable error instead of a silent drop or a generic "missing required parameter". CLI & init - `spice init` writes a `yaml-language-server: $schema=...` directive and prints next-step hints; creation message now prints regardless of log level. - Version banner moves to stderr and only prints on an interactive stderr. - Log filter honours RUST_LOG when `-v` is not passed. - 403 responses route to a new `PermissionDenied` variant; Unauthorized message points at `spice login` / `SPICE_API_KEY`. New command: `spice validate [path]` - Loads a spicepod without starting the runtime; prints component counts on success and a structured failure on parse/reference errors. Same codepath as the existing `tools/spicepod-validator`. Parameter + connector errors - Unknown parameter names now emit a Levenshtein-based "did you mean?" hint using the in-tree `util::levenshtein` helper. - "Missing required parameter" errors include the spec's description, first example, and help_link (docs URL) when provided. - UnknownEngine / UnknownDataConnector / UnknownCatalogConnector / UnknownSecretStore messages list the valid values (and for connectors, include a Levenshtein suggestion) plus a docs URL. - Added help_link / examples / one_of to high-value connector params: postgres, mysql, s3, kafka, duckdb, snowflake, databricks. - Improved wording on Snowflake/Postgres/Kafka bare driver errors. REPL - Welcome banner shows concrete examples; `help` / `?` lists meta-commands, SQL shortcuts, and nql usage. - New shims for `show schemas` / `show databases` / `describe <table>` (both bare and schema-qualified names). - `.clear` screen-wipe skipped when not on a TTY. Colour output - `ansi_colors::Painted` now honours NO_COLOR / FORCE_COLOR and falls back to `stdout.is_terminal()`; internal tests force it on via cfg(test). - Replaced raw ANSI escapes in `dataset configure` with Color helpers so piped output and CI logs stay clean. Secret resolution - Log messages on secret-ref lookup failure now include the full `${ store:KEY }` form, the owning param string, the list of configured stores, and a docs URL. Store-init failure gets the same treatment. Startup summary - After dispatching dataset loads, runtime logs a count of dispatched/failed. A detached task emits a follow-up one-liner at 30s with ready/errored/ initializing counts (aborts on shutdown). * fix(devx): address Copilot review comments and cargo fmt - REPL `.clear`: gate on `stdout().is_terminal()` directly instead of the colour heuristic. NO_COLOR should not suppress clear-screen on a TTY, and FORCE_COLOR should not push it into a pipe. - REPL `describe <table>`: match the keyword case-insensitively but keep the identifier token in its original case so quoted/mixed-case table names work, and escape single quotes (SQL `''`) before interpolating into the information_schema lookup. - `UnknownSecretStore`: build the available-stores list from cfg (adds `scheduler_rpc`, drops `keyring` / `aws_secrets_manager` when the feature is not enabled) via a new `known_secret_stores()` helper. - Apply `cargo fmt --all` (was failing the Rust Lint check). * fix(devx): avoid "failed"/"error" in healthy-startup INFO logs Quickstart CI asserts that spice.log from a clean start contains no `(failed|error)` tokens (case-insensitive) — that's its smoke-test for "nothing went wrong". The new startup summary broke this: INFO Loading datasets: 2 dispatched, 0 failed accelerator init (of 2 total). INFO Dataset load summary (after 30s): 3/3 ready, 0 errored, 0 still initializing. Both lines fire on every healthy start, and both contain banned tokens. Reword both so a healthy run is clean: - `N failed accelerator init` → `N skipped at accelerator init` - `M errored` → `M unhealthy` Per-dataset failures are still logged at WARN by `load_dataset`, so the smoke-test still catches real problems. * fix(devx): address second round of Copilot review comments Code fixes: - `runtime/src/init/dataset.rs`: skip the 30s follow-up sampler task entirely when there are no datasets — no point waiting just to log nothing. - `crates/repl/src/lib.rs`: `describe schema."MyTable";` now strips quotes from each identifier segment independently, so the generated `information_schema.columns` filter uses the unquoted table name. Testability refactor: - Extract the pure Levenshtein-scoring logic from `suggest_connector` into `dataconnector::closest_name(typo, candidates)`, reused by `catalogconnector::suggest_catalog_connector`. This keeps the async registry lookup and the scoring concerns separate so the scoring can be unit-tested directly. Tests added: - `runtime-parameters`: `closest_param_suggestion` close-typo match, distant-typo miss, and deprecated-spec skip; `describe_missing_parameter` empty/description-only/full-suffix/example-only. - `runtime/dataconnector::tests`: `closest_name` one-char typo, case insensitivity, distant returns None, empty candidates, and short-name threshold boundary. - `bin/spice/src/commands/validate.rs`: file path, directory containing spicepod, missing path, and invalid YAML — using `tempfile`. * fix(devx): cargo fmt validate.rs * fix(devx): address third round of Copilot review comments - `ansi_colors`: `Painted` now carries a `Target` (Stdout/Stderr). The TTY check is cached per stream, so a redirected stderr stays clean when stdout is a TTY (and vice versa). New `Color::paint_err()` returns a Painted tagged for stderr; `Color::paint()` stays stdout-targeted as before. Added 2 unit tests. - `spice validate`: the "Invalid:" prefix on the stderr failure path now uses `paint_err()` so it doesn't inject escapes into a redirected stderr just because stdout is a TTY. - `runtime-secrets::get_store_secret`: allocate the configured-stores list lazily inside the two error branches that actually consume it, instead of on every substitution. `inject_secrets` calls this in a tight loop, and the success path no longer pays for an allocation it discards. - `runtime::dataconnector::closest_name` doc comment: corrected the example from the parameter-typo case (`postgress_host` → `postgres_host`) to the connector-name case this helper actually handles (`postgress` → `postgres`). - `connector-mysql::sslmode`: dropped the overly-restrictive `one_of` that rejected modes supported by the underlying driver, and rewrote the `sslrootcert` description to stop referencing the Postgres-only verify-ca/verify-full mode names. * fix(ansi_colors): add #[must_use] to colors_enabled[_for] Workspace lint level is pedantic, which treats must_use_candidate as an error. Adding the attribute satisfies clippy without suppressing the lint. * fix(devx): address fourth round of Copilot review comments - `runtime-parameters::describe_missing_parameter`: rewrite as a `Vec<String>` joined with `. ` separators + a `. ` prefix. Previously an example-only spec produced a dangling leading space (`Missing required parameter: pg_host Example: ...`). Now every non-empty subset reads as a proper trailing sentence. Also strips an author-supplied trailing period from `description` so we don't concatenate `..`, and adds `description_without_trailing_period` / `help_link_only` tests. - `bin/spice/src/commands/validate.rs`: doc comment on `load_pod` now accurately describes the `Err(_)` branch as "metadata fails (not-found, permission-denied, …)", matching what the code actually does — falling through to `Spicepod::load_exact` lets the open-file step surface the real underlying I/O error with the path attached. * refactor: improve comments and logging for clarity and consistency * refactor: streamline warning log for unsupported parameters * fix(runtime): improve logging message for dataset loading summary * fix(validate): improve error handling for missing and invalid YAML files * feat(parameters): add support for case-insensitive option in parameter specifications * test(kafka): add unit test for uppercase values in security protocol * fix(rerank): defer execution to RerankExec, enable filters and projection pushdown (#10514) * fix(rerank): defer execution to RerankExec, add filter/projection pushdown * Add tests * Make tests more robust * feat(rerank): add task_history tracing to rerank UDTF (#10515) * feat(rerank): add task_history tracing to rerank UDTF * Update * Update snapshot * fix: use NEG_INFINITY for NULL document rows in rerank UDTF (#10512) * fix: use NEG_INFINITY for NULL document rows in rerank UDTF NULL document rows were assigned rerank_score = 0.0 when bypassed by the reranker. This causes NULL rows to sort above valid documents when a reranker (e.g. Cohere cross-encoder, BYO HTTP endpoint) returns negative scores, corrupting the ranking output. Use f32::NEG_INFINITY instead so NULL rows always sort last in DESC order regardless of the reranker's score range. * style: rustfmt the new null_rows_sort_below_negative_scores test --------- Co-authored-by: claudespice <claude@spice.ai> Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com> * fix: handle missing input in caching and ReturnEmpty modes for accelerator scan * fix: enhance filter handling for accelerator scans and add validation for support length * fix: simplify filter handling for accelerator scans by removing unnecessary reapplication logic * fix: streamline filter handling in tests by removing unnecessary clones * Update crates/runtime/src/accelerated_table/mod.rs Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix: don't push limit to accelerator when local filters are applied in UseSource mode When filters are split and some are applied locally via wrap_with_filter, pushing the limit down to the accelerator can truncate rows before local filtering runs, producing incorrect results. Only push limit when all predicates are forwarded to the accelerator. --------- Co-authored-by: claudespice <claude@spice.ai> Co-authored-by: Luke Kim <80174+lukekim@users.noreply.github.com> Co-authored-by: Sergei Grebnov <sergei.grebnov@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat(secrets): add Azure Key Vault secret store
New `azure_keyvault` secret store with per-key caching, single-flight fetch
coalescing, init-time credential pre-flight, and explicit auth-method
selection (service principal, managed identity, workload identity, Azure
CLI, or auto-detect). Selector accepts bare vault names or full URLs;
`endpoint` param supports sovereign clouds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(secrets): tighten secure-string handling across all stores
Audit fixes so every store keeps secret bytes inside `SecretString` (with
zeroize-on-drop) for their full in-memory lifetime, rather than round-tripping
through plain `String` allocations that are freed without scrubbing.
- `lib.rs` aggregator: replace `get_store_secret -> Option<String>` with
`lookup_for_injection -> Option<SecretString>` so `inject_secrets` splices
bytes via `expose_secret()` directly into the (preallocated) builder
instead of minting a separate plaintext `String` per substitution.
- `kubernetes`: service-account bearer token is now `Option<SecretString>`;
the internal `get_secret` returns `HashMap<String, SecretString>` so every
value carries its own zeroize-on-drop. Base64 decode uses
`Zeroizing<Vec<u8>>` for the intermediate buffer.
- `aws_secrets_manager`: cache stores `HashMap<String, SecretString>`
instead of `HashMap<String, String>`; JSON parser wraps each scalar as
`SecretString` at parse time.
- `azure_keyvault`: `client_secret` is now `SecretString` on both config
and store; `AzureKeyVaultConfig` carries a manual `Debug` impl that
redacts it so a rogue `{:?}` in a panic/log cannot leak a service-
principal secret. New test covers the redaction.
- `scheduler_rpc`: trait return type is `Result<SecretString, String>`
(was `Result<String, String>`) so implementors cannot forget to wrap;
the real gRPC impl in `runtime::cluster` wraps at the earliest point it
owns the plaintext.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(secrets): redact aws_secrets_manager credentials in Debug + SecretString
Found by a second audit pass. `AwsSecretsManagerConfig` derived `Debug`
with a plain `Option<String>` for `secret_access_key` and `session_token`,
so any `{:?}` print would surface the static AWS credentials — the same
failure class that prompted the azure_keyvault fix.
- Config + store now hold `secret_access_key` and `session_token` as
`Option<SecretString>`; `access_key_id` stays `Option<String>` since the
key id is not itself sensitive and is useful in debug output.
- Manual `Debug` impl redacts both secret fields while keeping the rest of
the struct visible.
- `build_aws_config` takes `Option<SecretString>` and calls
`expose_secret().to_string()` exactly once — at the AWS SDK boundary —
before handing the plaintext to `aws_credential_types::Credentials`.
- New `config_debug_redacts_static_credentials` unit test guards against
regression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(secrets): address azure_keyvault review comments
Copilot's review on #10496 flagged eight real issues — fixed all of them.
Concurrency in `AzureKeyVault::load()`:
- No longer hold the `inflight` tokio `Mutex` across any `.await`. The
under-mutex cache recheck (which awaited the cache `RwLock`) is gone;
the loser path re-checks cache after dropping the mutex.
- Loser path uses `Notified::enable()` to pre-register for the wakeup
before the final cache check, closing the classic lost-wakeup window
where a winner could fire `notify_waiters()` between mutex release and
first poll.
- Winner now publishes cache BEFORE releasing the inflight entry, so a
fresh task that arrives after cleanup cache-hits on its fast path
rather than starting a duplicate fetch. `inflight.remove()` and
`notify_waiters()` run inside the same critical section so a new
loser cannot subscribe to a now-silent `Notify`.
Auth validation:
- `from_params` normalizes empty-string params to `None` (shared
`non_empty` helper) so `client_secret: ""` in a spicepod can't flip
default into service-principal mode with a blank secret.
- `effective_auth_method` becomes a free function shared by init and
lookup; requires `client_secret` to be `Some(non-empty)` before
upgrading `Default` to `ServicePrincipal`.
- `from_config` validates against the *effective* auth method so
`auth_method: default` + `client_secret` set + missing `tenant_id`
or `client_id` fails fast with `MissingAuthParams`.
Docs vs implementation:
- Narrowed `auth_method: default` docs + `AuthMethod::Default` comment
+ module header to match reality: service principal if
`client_secret` is set, else `DeveloperToolsCredential`. Managed /
workload identity are explicitly opt-in — noted why (noisy IMDS /
federated-token failure modes).
- Rewrote `tenant_id` / `client_id` ParameterSpec descriptions so
"required for workload_identity" (which was false) becomes
"optional, resolved from env when omitted".
Test env-var safety:
- Added `static ENV_LOCK: tokio::sync::Mutex<()>` in the tests module;
every env-mutating test now acquires it via `lock_env().await` up
front. Serializes the seven `std::env::set_var`/`remove_var` call
sites across parallel test runs, upholding the 2024-edition
unsafe contract.
New unit tests cover: empty-string normalization, empty-secret
effective-method fallback, and effective-method validation at
`from_config` time. 65 tests pass (up from 62).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* style(secrets): apply cargo fmt
Formatting-only pass over the files touched by the recent audit +
review-fix commits to satisfy `make lint` / `cargo fmt --check` on CI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(secrets): satisfy repo's pedantic clippy gate
CI's `make lint` runs clippy with `-Dclippy::pedantic` + several extra
dense rules (`unwrap_used`, `clone_on_ref_ptr`, `allow_attributes`, etc.),
which my earlier `cargo clippy --all-targets -D warnings` did not cover.
Fixing the 7 lib errors + 4 tests errors it surfaced, all in
azure_keyvault + the aws_secrets_manager JSON parser tests:
- `auth_method` param lookup: `map(..).unwrap_or(Default)` →
`map_or(Default, ..)`
- `init()`: merged `None` / `Some(Ok(_))` match arms with identical bodies
- `load()` loser path: dropped the redundant trailing `continue`
(the match is the last expression in the loop body)
- `fetch_one`: `Ok(None) => continue` → `Ok(None) => {}` (loop advances
by itself)
- `build_credential`: dropped `async` — no `.await` inside
- Removed the unused `key_vault_scope()` helper and the `KEY_VAULT_SCOPE`
constant it hid behind `#[allow(dead_code)]`; the SDK supplies the
scope on its own.
- Doc backticks around `GetSecret`, `client_secret`, `tenant_id`,
`client_id`, and `ClientSecretCredential` in module + test doc
comments.
- Test rewrites: `.map(|s| s.expose_secret())` →
`.map(ExposeSecret::expose_secret)`; `match ... { Ok(_) => panic!() }`
→ `let Err(err) = ... else { panic!() }`.
65 unit + 5 AWS live + 4 Azure live + 1 doctest still pass; scoped
pedantic clippy (`--lib --tests`) now clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(secrets): address review comments on feature gating + URL validation
Two Copilot review comments on the azure_keyvault PR:
Feature gating (#10496 thread 1):
- `azure-keyvault` was in `runtime-secrets`'s `default` feature set, so
any workspace crate depending on `runtime-secrets` without
`default-features = false` (e.g. `runtime-acceleration`) pulled
`azure_core` / `azure_identity` / `azure_security_keyvault_secrets`
into builds that don't need them. Removed it from `default` and wired
it through the established per-binary pattern:
- `runtime/Cargo.toml`: new `azure-keyvault` feature forwarding to
`runtime-secrets/azure-keyvault`, sibling to `aws-secrets-manager`.
- `bin/spiced/Cargo.toml`: new `azure-keyvault` feature forwarding
to `runtime/azure-keyvault`, sibling to `aws-secrets-manager`.
URL validation (#10496 thread 2):
- `resolve_vault_url` / `validate_https` only checked the scheme prefix,
so `https://` or `https://vault/some/path` used to pass validation and
fail later inside the Azure SDK. Replaced with `url::Url`-based parsing
in a new `validate_vault_url` helper that rejects:
- non-URL strings (`https://`, `https:///`, etc.)
- missing host
- any path, query, fragment, or embedded userinfo (Key Vault URLs are
host-only — anything else is user error)
- non-https schemes (keeping the `http://` plaintext-specific message)
- Custom ports now round-trip correctly (e.g. `:8443` for localstack).
- Added five new unit tests covering the rejection paths + port
preservation; pulled `url` in as a feature-gated optional dep under
`azure-keyvault`.
70 unit + 5 AWS live + 4 Azure live + 1 doctest pass; pedantic clippy
(`--lib --tests`) clean against the CI recipe.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(secrets): whitespace-param normalization + deterministic cache test
Two Copilot review comments on the azure_keyvault PR:
Whitespace-only params (thread 2):
- `from_params` normalized `""` → `None` but treated `" "`,
`"\t\n"`, etc. as present, which would pass validation and later
fail inside the Azure SDK with a less actionable error.
`non_empty` now trims surrounding whitespace before the emptiness
check. Trim is surface-only: non-empty values land in the stored
config stripped of leading/trailing whitespace, which also fixes
the common "I pasted from my terminal and got a trailing newline"
failure. Two new unit tests.
Timing-based live test (thread 1):
- `live_repeated_lookups_are_served_from_cache` asserted the second
lookup was ≥10× faster than the first — prone to flaking under CI
network jitter. Replaced with a deterministic cache-counter check:
`AzureKeyVault` now carries always-on `AtomicUsize` counters for
hits / misses and a `#[doc(hidden)] pub fn cache_stats()` accessor
(not `#[cfg(test)]` because integration tests compile the library
as a regular dependency without the `test` cfg). The live test
snapshots counters around the second call and asserts
`hits += 1, misses += 0` — no wall-clock, no jitter window.
Thread 3 (scheduler_rpc trait signature change being semver-breaking)
is answered in-line on GitHub: workspace version is `2.0.0-unstable`,
only in-tree implementor was updated in the same commit, release
notes will call out the `SecretString` upgrade when this lands.
72 unit + 5 AWS live + 4 Azure live + 1 doctest pass; pedantic clippy
clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(ci): retrigger pr workflow after stuck Rust Lint job
The previous run (24902935014) had Rust Lint hang for 80+ minutes on
the compile step. `gh run cancel` didn't propagate to the runner, so
`gh run rerun --failed` was refused with "workflow is already running."
An empty commit triggers a fresh pr workflow run, and the workflow's
concurrency group (cancel-in-progress: true) will finalize the stuck
run when the new one starts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(secrets): address PR review comments
* fix(secrets): address latest PR review
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot <copilot@github.com>
* Fix clickbench - accelerated/spicecloud-arrow.yaml * Fix clickbench - accelerated/s3[parquet]-turso[file].yaml fix: Add refresh_sql to clickbench s3[parquet]-turso[file] to prevent OOM #10533 Turso (libSQL) uses a single transaction for overwrite_all, which causes memory exhaustion when ingesting the full clickbench dataset (~100M rows). This matches the existing pattern used by sqlite[file] and sqlite[memory] * Benchmarks: Increase timeout for turso w/ indexes configuration * Fix Build ADBC BigQuery driver step * fix(datafusion-federation): handle EXISTS/NOT EXISTS subqueries in federation analyzer * Improve * Increase timeout for Databricks SQL Warehouse (ensure enough time for cluster to start) * Update * Fix Oracle catalog configuration params * Add oracle catalog override * MSSQL: fix `substr` unparsing * Make mssql substr_to_substring more robust * Update * Improve * Add missing catalog overrides * Fix lint
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )