[pull] trunk from spiceai:trunk by pull[bot] · Pull Request #763 · TheRakeshPurohit/spiceai

pull · 2026-04-21T21:06:17Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

In distributed Ballista mode, decoded ParquetSource plans on executors lose their per-scan parquet_file_reader_factory during proto round-trip and fall back to runtime_env().object_store(url). The delta_lake connector did not implement DataConnector::register_object_stores, so executors built a default S3 store with no region configured, surfacing as 'Received redirect without LOCATION' errors against buckets outside us-east-1. Mirror the databricks connector's implementation: retain the connector Parameters on DeltaLake, and in register_object_stores parse the dataset path as the storage URL, encode the AWS/Azure/GCS subset of params via Parameters::storage_registry_params() into the URL fragment, and call runtime_env.object_store(&listing_url) so SpiceObjectStoreRegistry builds and caches a properly-configured store on each executor.

* Revert "Revert DF-native DML (#10114)" This reverts commit bd75c67. * Lint * Lint * Address copilot comments * Add tests * Fix * Lint * Update distributed Cayenne catalog snapshots for sort pushdown into FlightSQL Query plans changed: ORDER BY is now pushed into FlightSqlExec SQL queries instead of a separate SortExec node, and SubqueryAlias wraps TableScan in the logical plan. Update 5 affected insta snapshots to match new output.

#10434) * ci: run Build and Test on spiceai-macos; split install jobs by profile - Move build-test to spiceai-macos runners with spiceio setup + cleanup and log upload, matching lint-rust. - Split make install variants into two jobs: build-install-dev (make install-dev) and build-install-release (make install, install-cli, install-runtime), each on spiceai-macos with spiceio setup + cleanup. * ci: gate spiceio log upload and snapshot push on relevant_changes Addresses Copilot review comments on #10434: avoid noisy/empty artifacts and stale log uploads by gating unconditional always() steps on relevant_changes and a non-empty setup-spiceio pid. * ci: merge lint-rust into build-test; one job per profile Since lint and build-test both run on spiceai-macos under the lint profile, consolidate them into a single 'Lint, Build, and Test' job. Workflow now has three Rust jobs, one per profile: - build-test (lint profile): make lint-rust + build-cli + nextest + testoperator - build-install-dev (dev profile): make install-dev - build-install-release (release profile): make install / install-cli / install-runtime * Revert "ci: merge lint-rust into build-test; one job per profile" This reverts commit 0ecad35.

* Improve search UDTFs: text_search, vector_search, rrf text_search: - Use Tantivy QueryParser as primary path to honor operators (AND/OR/NOT), phrases ("exact match"), field-scoped queries (title:foo) and boosts (term^2). Falls back to bag-of-words OR clause on parser errors. - Fix pagination loop bug that issued empty queries after index exhaustion: decrement remaining_limit by actual hit count and short-circuit on partial pages. - Fix multi-index column selection: pick the FTS index containing the requested column rather than an arbitrary pop(). - Filter spice.parameter_name passthrough literals in parse_args so RRF named args (e.g. rank_weight) don't confuse text_search. - "Did you mean?" suggestions and column listings on column-not-found errors. vector_search: - Column suggestions on missing indexed column via Levenshtein. rrf: - New 'limit' named arg. Propagates to subqueries with a 4x candidate-pool multiplier to preserve recall; post-fuse .limit caps final output exactly. - Fail fast on invalid recency_decay instead of silently dropping it. - Accept bare identifier for time_column/join_key named args, e.g. 'time_column => picked_at' in addition to 'time_column => \'picked_at\''. - Limit propagated through distributed serialization (proto + codec). Tests: - Unit tests for text_search (parse_args filter, column helper, suggestions, levenshtein). - Unit tests for vector_search (closest_column, levenshtein). - Unit tests for rrf (limit named arg, identifier named args, serialization). Verified: 1041 runtime lib tests + 22 search-crate tests pass. * feat: add distance metric support to vector search UDTFs * fix: format distance metric mapping for better readability * fix: update error messages for clarity in SQL query execution failures * fix: optimize cloning in TextSearchTableFunc and improve error handling in ReciprocalRankFusionArgs * fix: add backticks to DataFusion in rrf doc comment for clippy Fixes clippy::doc_markdown lint on test-only doc comment. * fix: enhance argument parsing in UDTFs to support named parameters and improve error handling * fix: format code for consistency in TextSearchTableFunc * fix: enhance named argument handling in text_search UDTF for optional fields * fix: simplify column extraction logic in TextSearchTableFunc * fix: update constraints in SearchQueryProvider to maintain PK indices based on include_score * fix: streamline limit and include_score handling in UDTFs for improved clarity * fix: address UDTF review comments - vector_search: fail fast on out-of-range `limit` (match text_search) - vector_search: reject `distance_metric => 'dot'` at parse time instead of silently constructing args that fail with NotImplemented later - vector_search: sort indexed-column lists so error messages and Levenshtein suggestions are deterministic across runs - rrf: drop synthetic `__spice_rrf_row_id` after the secondary-sort step when no user join_key was provided, so it doesn't leak into the user-visible output schema of `rrf(...)` * Lint * fix: address PR review feedback on search UDTFs - Replace duplicate Levenshtein implementations in FTS and embeddings UDTFs with util::levenshtein::distance. - Defer all_indexed_fields collection in text_search to error/disambig paths so the happy query path skips it. - Fix include_score named-arg override: keep the positional default as None and apply Some(true) only after named-arg merge so include_score => false isn't silently dropped. - Drop the secondary join-key sort in RRF that was added purely for test determinism. - Drop "dot" from distance_metric docs in provider.rs and spice.proto to match runtime behavior (only cosine and l2 are supported). - Move FieldMetadata/BTreeMap imports out of vector_search to_expr function body. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address copilot review on error kind and distance_metric - Use DataFusionError::Plan in TextSearchTableFuncArgs::column for invalid user arguments to match the rest of the module's planning errors and keep the error classification consistent upstream. - Fail fast in vector_search when distance_metric is set against an index-backed provider (S3 vectors, Elasticsearch, chunked) instead of silently using the index's configured metric. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: apply rustfmt to new include_score test helper Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update UdtfSource::VectorSearch signature to include distance_metric Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: update snapshot to match DataFusionError::Plan classification The Execution→Plan change in TextSearchTableFuncArgs::column surfaces as "Error during planning:" in the HTTP error message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix distance * Lint * Lint * lint: fix cast_possible_truncation in duckdb test helper --------- Co-authored-by: Viktor Yershov <viktor@spice.ai> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…sformers layouts (#10444) * fix(model2vec): Improve robustness of model loading for sentence-transformers layouts * Update sha

phillipleblanc and others added 5 commits April 21, 2026 14:57

fix(model2vec): Improve robustness of model loading for sentence-tran…

e9768bf

…sformers layouts (#10444) * fix(model2vec): Improve robustness of model loading for sentence-transformers layouts * Update sha

pull Bot locked and limited conversation to collaborators Apr 21, 2026

pull Bot added the ⤵️ pull label Apr 21, 2026

pull Bot merged commit e9768bf into TheRakeshPurohit:trunk Apr 21, 2026
1 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] trunk from spiceai:trunk#763

[pull] trunk from spiceai:trunk#763
pull[bot] merged 5 commits into
TheRakeshPurohit:trunkfrom
spiceai:trunk

pull Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pull Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pull Bot commented Apr 21, 2026 •

edited

Loading