Vendor VSS source as files instead of a submodule (fix Windows build)#17
Merged
Merged
Conversation
duckdb-vss has its own nested `duckdb`/`extension-ci-tools` git submodules. Cargo recursively checks out submodules of the duckdb-rs git dependency, so the vss *submodule* dragged in a full nested duckdb checkout whose deeply-nested Swift example paths exceed Windows' MAX_PATH (260), breaking the Windows build (`path too long ... class=Filesystem`). Replace the extension/vss/upstream submodule with a committed copy of duckdb-vss@b833341c's src/ (vss_extension.cpp + hnsw/*.cpp + header-only usearch/fp16/simsimd headers + LICENSEs). No submodule => nothing for Cargo to recurse into => Windows builds. Update vss_config.py paths (upstream/src -> src) and SKILL.md (vendored copy, not a submodule).
There was a problem hiding this comment.
Pull request overview
This pull request removes the extension/vss/upstream git submodule and instead vendors the DuckDB VSS (HNSW) extension source directly under extension/vss/src/ to avoid Cargo recursively fetching nested submodules that break Windows builds due to MAX_PATH limitations.
Changes:
- Replace the VSS
upstreamsubmodule with a committed copy ofduckdb-vss’ssrc/tree (including header-only deps). - Update
extension/vss/vss_config.pyto point to the vendoredextension/vss/srcpaths. - Update
extension/vss/SKILL.mdto document why vendoring is required and how to upgrade.
Reviewed changes
Copilot reviewed 38 out of 39 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| .gitmodules | Removes VSS submodule entry. |
| extension/vss/vss_config.py | Updates packaging paths to use vendored extension/vss/src. |
| extension/vss/SKILL.md | Documents the vendored approach + upgrade procedure. |
| extension/vss/src/vss_extension.cpp | Vendored VSS extension entrypoint/registration. |
| extension/vss/src/hnsw/CMakeLists.txt | Vendored build-source list for HNSW sources. |
| extension/vss/src/hnsw/hnsw_index.cpp | Vendored HNSW index implementation (usearch-backed). |
| extension/vss/src/hnsw/hnsw_index_macros.cpp | Vendored SQL macros registration for VSS helpers. |
| extension/vss/src/hnsw/hnsw_index_physical_create.cpp | Vendored physical operator to build HNSW indexes. |
| extension/vss/src/hnsw/hnsw_index_physical_create.hpp | Vendored header for physical create-index operator. |
| extension/vss/src/hnsw/hnsw_index_plan.cpp | Vendored index planning for CREATE INDEX (HNSW). |
| extension/vss/src/hnsw/hnsw_index_pragmas.cpp | Vendored pragmas (compact/info) for HNSW indexes. |
| extension/vss/src/hnsw/hnsw_index_scan.cpp | Vendored index scan table function implementation. |
| extension/vss/src/hnsw/hnsw_optimize_expr.cpp | Vendored expression optimizer rules for distance funcs. |
| extension/vss/src/hnsw/hnsw_optimize_join.cpp | Vendored join optimizer for HNSW-based join rewrite. |
| extension/vss/src/hnsw/hnsw_optimize_scan.cpp | Vendored TopN→index-scan optimizer rewrite. |
| extension/vss/src/hnsw/hnsw_optimize_topk.cpp | Vendored min_by→index-scan/top-k optimizer rewrite. |
| extension/vss/src/hnsw/hnsw_topk_operator.cpp | Vendored stub TopK operator registration hook. |
| extension/vss/src/include/aggregate_function_matcher.hpp | Vendored matcher helper header for optimizer rules. |
| extension/vss/src/include/vss_extension.hpp | Vendored extension class declaration. |
| extension/vss/src/include/hnsw/hnsw.hpp | Vendored HNSW module registration header. |
| extension/vss/src/include/hnsw/hnsw_index.hpp | Vendored HNSW index type header. |
| extension/vss/src/include/hnsw/hnsw_index_scan.hpp | Vendored HNSW index scan bind/function header. |
| extension/vss/src/include/fp16/LICENSE | Vendored fp16 license. |
| extension/vss/src/include/fp16/bitcasts.h | Vendored fp16 header-only dependency. |
| extension/vss/src/include/fp16/fp16.h | Vendored fp16 header-only dependency. |
| extension/vss/src/include/usearch/LICENSE | Vendored usearch license. |
| extension/vss/src/include/usearch/duckdb_usearch.hpp | Vendored DuckDB wrapper for usearch config macros. |
| extension/vss/src/include/usearch/index.hpp | Vendored usearch header-only dependency. |
| extension/vss/src/include/usearch/index_dense.hpp | Vendored usearch header-only dependency. |
| extension/vss/src/include/usearch/index_plugins.hpp | Vendored usearch header-only dependency. |
| extension/vss/src/include/simsimd/LICENSE | Vendored simsimd license. |
| extension/vss/src/include/simsimd/types.h | Vendored simsimd header-only dependency. |
| extension/vss/src/include/simsimd/binary.h | Vendored simsimd header-only dependency. |
| extension/vss/src/include/simsimd/geospatial.h | Vendored simsimd header-only dependency. |
| extension/vss/src/include/simsimd/probability.h | Vendored simsimd header-only dependency. |
| extension/vss/src/include/simsimd/dot.h | Vendored simsimd header-only dependency. |
| extension/vss/src/include/simsimd/simsimd.h | Vendored simsimd header-only dependency. |
| extension/vss/src/include/simsimd/spatial.h | Vendored simsimd header-only dependency. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
phillipleblanc
added a commit
to spiceai/duckdb-rs
that referenced
this pull request
Jun 3, 2026
Point the duckdb-sources submodule at spiceai/duckdb#17 (01dbffdd), which vendors the vss extension as committed files (extension/vss/src) instead of a git submodule. Cargo recursively checks out submodules of the duckdb-rs git dependency, so the vss submodule pulled in duckdb-vss's own nested `duckdb` checkout, whose deep Swift example path exceeded Windows' MAX_PATH and broke the Windows build. A vendored copy has nothing for Cargo to recurse into. Regenerated duckdb.tar.gz (vss sources now under extension/vss/src; DUCKDB_VERSION still v1.5.3; vss still statically linked). Updated SKILL.md for the vendored layout.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the
extension/vss/upstreamgit submodule (→ duckdb-vss) with a committed copy of duckdb-vss'ssrc/. Fixes the Windows build break introduced by the static-VSS work (#16).Why
duckdb-vss has its own nested
duckdbandextension-ci-toolssubmodules. Cargo recursively checks out all submodules of the duckdb-rs git dependency, so the vss submodule dragged in a full nestedduckdbcheckout — whose deeply-nested Swift example path exceeds Windows'MAX_PATH(260 chars):Linux/macOS were unaffected, so it only surfaced on the Windows CI build. A vendored copy has no submodules for Cargo to recurse into.
Change
extension/vss/upstreamsubmodule and.gitmodules.extension/vss/src/— a committed copy ofduckdb/duckdb-vss@b833341c'ssrc/(vss_extension.cpp, hnsw/*.cpp, and the header-only usearch/fp16/simsimd headers + their LICENSEs). Same ABI-matched commit as before.vss_config.py: pathsextension/vss/upstream/src→extension/vss/src.SKILL.md: documents the vendored approach and why it must stay vendored (the Windows MAX_PATH reason); upgrade procedure now re-vendors the source instead of bumping a submodule.The generated
duckdb.tar.gzcontent is unchanged (same vss source files); only how they're stored in the tree changes. This is step 1 of the re-chain (duckdb-rs → table-providers → spiceai duckdb#11107 follow).