[pull] trunk from spiceai:trunk#829
Merged
Merged
Conversation
* fix: show pushed-down `limit` in `tree` EXPLAIN of `DataSourceExec` Upstream DataFusion's `FileScanConfig::fmt_as` already emits `limit=N` for the default/indent format but skips it in `TreeRender`, so a scan that received a fetch limit looked identical to one without in `EXPLAIN FORMAT TREE`. Add `LimitDisplayDataSource`, a transparent `DataSource` wrapper that appends `limit=N` to the tree output when the inner source reports a fetch, and register `DataSourceTreeDisplayOptimizer` to apply the wrap after limit pushdown. `as_any` passes through so downcasts to `FileScanConfig` keep working. Fixes #10602 https://claude.ai/code/session_01AXt4ypyW8znHu1bR5KMjUJ * chore: update copyright header to 2026 Match convention used in other files created in 2026. https://claude.ai/code/session_01AXt4ypyW8znHu1bR5KMjUJ * test(retention): poll for retention SQL to apply before snapshotting The retention task ticks on a 200ms background interval, so the row count after `refresh_table().await` is racy — on slow runners the immediate query observes the unfiltered rows and the snapshot fails. Replace the bare query with a `wait_for_retention_applied` helper that polls every 100ms (10 s budget) until the count matches the snapshot's expected row count, mirroring the explicit sleep used in the unit-test version of retention. https://claude.ai/code/session_01AXt4ypyW8znHu1bR5KMjUJ * test(retention): bound retention poll loop by `timeout_at` Address review feedback on the polling helper: - A slow query could exceed the advertised `timeout` since the deadline was only checked between attempts. - The sleep was not capped by the deadline, so the loop could sleep past it before noticing. Wrap the entire loop in `tokio::time::timeout_at(deadline, …)`. The runtime cancels both the in-flight query and the inter-attempt sleep when the deadline elapses, guaranteeing the function returns within `timeout`. https://claude.ai/code/session_01AXt4ypyW8znHu1bR5KMjUJ * ci(graceful-shutdown-append): tolerate transient parquet-rename races `simulate_append_data.sh` republishes `.spice/service_data.parquet` every 3s by writing a `.tmp` file and `mv -f`-ing it into place. A refresh tick (running on a 1s interval) that observes the rename mid-flight reads a half-published file and the DuckDB reader emits a TProtocolException, which the refresh task logs at WARN and then retries successfully on the next tick. The `Check logs` step's `grep -iE "(failed|error)"` is a case-insensitive substring match, so it treats the transient WARN as a hard failure — even though the runtime handled it correctly and the workload ran to completion. Filter that specific WARN (and the chained `Invalid Error: TProtocolException` line) out of the grep so the step only trips on genuine errors. Scoped to `graceful_shutdown_test_append` since it's the only job that runs against the appending-data simulator. https://claude.ai/code/session_01AXt4ypyW8znHu1bR5KMjUJ * test(limit-display): assert `limit=` lands on its own TreeRender line Strengthen the unit test to verify the TreeRender contract — each `key=value` on its own newline-terminated line — by asserting that `limit=2` appears with a leading `\n` and is not glued to a sibling field. This guards against accidentally regressing the format if a future DataSource impl forgets the trailing `writeln!`. https://claude.ai/code/session_01AXt4ypyW8znHu1bR5KMjUJ * bench(limit-display): measure `DataSourceTreeDisplayOptimizer` cost per query Add a criterion microbenchmark sweeping plan width (1, 10, 50, 100 `DataSourceExec` nodes via `UnionExec`) and the cold (no fetch limit on any scan) vs worst-case (every scan has a fetch limit) paths. The former is the hot path on virtually all queries — every node short- circuits via `Transformed::no` after one `downcast_ref` + a single `fetch().is_none()` check. The latter exercises the wrapping path where each node also pays for an `Arc::clone` of the inner source plus a `LimitDisplayDataSource` allocation and a `with_data_source` swap. Lets reviewers and future maintainers verify the always-on cost stays within a small fraction of typical query planning time. https://claude.ai/code/session_01AXt4ypyW8znHu1bR5KMjUJ * Lint * Lint * Fix test * Fix --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Viktor Yershov <viktor@spice.ai>
…#10757) * docs: add product security audit report Workspace-wide audit of auth, secrets, HTTP/Flight surface, AI/LLM tooling, data connectors, and the container/build pipeline. Two critical findings (non-constant-time API key compare; SSRF in remote UDF endpoint validation), four high (MySQL preferred-mode TLS bypass, Vault tls_skip_verify on remote hosts, AWS default credential chain unverified at init, no auth-failure rate limiting), and a set of medium/low hardening items. Each finding cites file:line and a fix. https://claude.ai/code/session_01Xu4b9TS1UvGu4H1a7NxHKj * fix(security): address critical findings from security audit (C1, C2) C1 — runtime-auth API key allowlist iteration The per-comparison code was already constant-time: ApiKey's PartialEq<str> uses subtle::ct_eq, so per-byte timing does not leak key bytes. However, HTTP, Flight basic, Flight bearer, and gRPC paths all used find / any to search the configured key list, which short-circuits on the first matching key. That leaks the *position* of the matching key (1 comparison vs N) and the existence of any match. Replace find / any with a lookup helper that iterates over every configured key on every call without short-circuiting; total verify time is O(N) regardless of which key (if any) matched. Add tests covering matches at first / middle / last positions and empty-credential rejection on the Flight basic-auth path. C2 — remote UDF endpoint SSRF parse_endpoint previously validated only the URL scheme; any operator- or LLM-influenced UDF declaration could target loopback, RFC1918, or the cloud metadata service via 169.254.169.254 — a practical IAM credential exfiltration path on any cloud deployment. parse_endpoint now classifies the host: * IPv4 literals are rejected if loopback, RFC1918, link-local (covers AWS / GCP / Azure IMDS), multicast, broadcast, unspecified, or carrier-grade NAT (100.64.0.0/10). * IPv6 literals are rejected if loopback, unspecified, multicast, link-local (fe80::/10), unique-local (fc00::/7, covers fd00:ec2::254), or IPv4-mapped onto any of the above ranges. * A short list of conventional metadata hostnames (localhost, metadata, metadata.google.internal) is also blocked. Arbitrary hostnames are still allowed; resolving DNS at builder time introduces a TOCTOU/DNS-rebinding hole, so connect-time enforcement is tracked as follow-up work. * A new allow_private_endpoints: true param opts back in for trusted on-host services. Documented in the file header. Tests cover the default rejection across 12 representative private addresses (loopback, RFC1918, IMDS, IPv6 ULA, IPv4-mapped IPv6, etc.), the explicit opt-in path, public-address acceptance, and an invalid-opt-in-value rejection. Existing round-trip tests that bind a mock HTTP server on 127.0.0.1 are routed through a new with_loopback_optin helper so they keep working. Also update docs/security_audit.md to reflect what was fixed and correct the earlier claim that the per-comparison code was not constant-time. https://claude.ai/code/session_01Xu4b9TS1UvGu4H1a7NxHKj * docs: remove security audit report The audit report was a working artifact; the actionable items live in the code changes (constant-time-position API key lookup, remote-UDF SSRF allowlist) plus their tests. Remaining findings can be tracked as issues rather than as in-tree documentation. https://claude.ai/code/session_01Xu4b9TS1UvGu4H1a7NxHKj * fix: address security PR review feedback * fix: improve security checks for remote UDF endpoints and refactor IP validation logic * fix: resolve ownership issues in IP validation logic for disallowed addresses * fix: use CIDR allowlist for remote UDF endpoints * fix: address final security review comments * fix(security): reject pickle model weights by default; enforce RUSTSEC advisories in CI Two follow-ups motivated by CVE-2026-7482 (Ollama heap OOB read in GGUF tensor parsing). The CVE itself does not affect this codebase — Ollama is not a dependency — but the bug class (malformed binary blob handed to a complex parser) does apply to several places spiceai consumes. 1. Pickle weight rejection -------------------------------------------------------------------- The .pt / .pth / .ckpt / .bin extensions are conventionally Python pickle in PyTorch's ecosystem, and pickle deserialization is RCE by design. The local-LLM file pipeline previously accepted whatever the operator pointed at and handed it straight to mistralrs's loader. Operators commonly download model weights from third-party Hugging Face repos, so a malicious .bin in such a repo would have run arbitrary Rust process code at load time. Add `llms::chat::reject_unsafe_weight_formats` and call it from `runtime::model::chat::file` before invoking `create_local_model`. The check refuses pickle-class extensions by default and accepts a new `params.trust_pickle: true` opt-in for operators who control the source. The check is case-insensitive and covers the four conventional pickle extensions. Five unit tests cover rejection, acceptance of safe formats, opt-in, case-insensitivity, and extensionless paths. 2. Enforce RUSTSEC advisories in CI -------------------------------------------------------------------- The `[advisories]` block in deny.toml was schema-pre-v2 and CI only ran `cargo deny check licenses`. Upgrade the block to `version = 2` (cargo-deny v2 default semantics: vulnerability / unmaintained / unsound / notice all fail; yanked already explicitly denied), and add a second step in `license_check.yml` that runs `cargo deny check advisories`. This intentionally turns on enforcement without pre-populating `ignore` — the workspace currently has known findings (yanked crates, unmaintained warnings, and a small number of actual CVEs in transitive deps) that the team will want to triage explicitly rather than silently bypass. https://claude.ai/code/session_01Xu4b9TS1UvGu4H1a7NxHKj * fix: tighten remote UDF endpoint allowlist handling * fix: correct typo in comment regarding PyTorch pickle file extensions * fix: update dependencies in Cargo.lock and enhance ApiKeyAuth implementation * fix: ungate Path import so reject_unsafe_weight_formats builds without local_llm reject_unsafe_weight_formats and the in-module tests use Path / PathBuf unconditionally, but the imports were gated behind feature = "local_llm". The runtime integration test build (no local_llm feature) failed with E0425 "cannot find type `Path` in this scope". Ungate Path, gate PathBuf to local_llm or test so it isn't unused in lib builds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: downgrade getrandom dependency from 0.4.1 to 0.3.4 * fix: filter remote UDF DNS results * fix: update cargo-deny installation command and improve IP address checks for non-public endpoints * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix: specify cargo-deny version and improve error messages in remote functions * fix: update global IP address check to allow 192.0.0.0/24 range --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Remote permanent errors from DynamoDB Streams * Address feedback * Lint * Fix max backoff * Apply suggestions from code review Co-authored-by: Phillip LeBlanc <879445+phillipleblanc@users.noreply.github.com> --------- Co-authored-by: Phillip LeBlanc <879445+phillipleblanc@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )