Commit 97f21d0
Update Rust crate lance-encoding to v6 (vortex-data#7974)
This PR contains the following updates:
| Package | Type | Update | Change |
|---|---|---|---|
| [lance-encoding](https://redirect.github.com/lance-format/lance) |
dependencies | major | `4` → `6` |
---
> [!WARNING]
> Some dependencies could not be looked up. Check the [Dependency
Dashboard](..vortex-data/issues/357) for more information.
---
### Release Notes
<details>
<summary>lance-format/lance (lance-encoding)</summary>
###
[`v6.0.0`](https://redirect.github.com/lance-format/lance/releases/tag/v6.0.0)
[Compare
Source](https://redirect.github.com/lance-format/lance/compare/v4.0.1...v6.0.0)
<!-- Release notes generated using configuration in .github/release.yml
at v6.0.0 -->
#### What's Changed
##### Breaking Changes 🛠
- refactor!: vendor the tokenizer stack into lance by
[@​Xuanwo](https://redirect.github.com/Xuanwo) in
[#​6512](https://redirect.github.com/lance-format/lance/pull/6512)
- perf!: run scheduler initialize eagerly in async read\_tasks by
[@​westonpace](https://redirect.github.com/westonpace) in
[#​6710](https://redirect.github.com/lance-format/lance/pull/6710)
##### New Features 🎉
- feat: support segmented inverted index build and search by
[@​Xuanwo](https://redirect.github.com/Xuanwo) in
[#​6305](https://redirect.github.com/lance-format/lance/pull/6305)
- feat(dictionary-namespace): support table related operation by
[@​zhangyue19921010](https://redirect.github.com/zhangyue19921010)
in
[#​6308](https://redirect.github.com/lance-format/lance/pull/6308)
- feat: clean up transaction files on failed commits by
[@​wjones127](https://redirect.github.com/wjones127) in
[#​6319](https://redirect.github.com/lance-format/lance/pull/6319)
- feat: add planned blob reads with source-level coalescing by
[@​Xuanwo](https://redirect.github.com/Xuanwo) in
[#​6352](https://redirect.github.com/lance-format/lance/pull/6352)
- refactor: use exact base-scoped store bindings by
[@​Xuanwo](https://redirect.github.com/Xuanwo) in
[#​6422](https://redirect.github.com/lance-format/lance/pull/6422)
- feat: wire batch\_size\_bytes to Python and public Rust API by
[@​westonpace](https://redirect.github.com/westonpace) in
[#​6428](https://redirect.github.com/lance-format/lance/pull/6428)
- feat(vector): add partition search parallelism by
[@​BubbleCal](https://redirect.github.com/BubbleCal) in
[#​6475](https://redirect.github.com/lance-format/lance/pull/6475)
- feat(index): support float16 and float64 in IVF\_FLAT by
[@​BubbleCal](https://redirect.github.com/BubbleCal) in
[#​6476](https://redirect.github.com/lance-format/lance/pull/6476)
- feat: batch chopping fallback for filtered read by
[@​westonpace](https://redirect.github.com/westonpace) in
[#​6482](https://redirect.github.com/lance-format/lance/pull/6482)
- feat(java): add Dataset.sample() API by
[@​beinan](https://redirect.github.com/beinan) in
[#​6500](https://redirect.github.com/lance-format/lance/pull/6500)
- feat: add ANN proto codecs and extract table\_identifier module by
[@​LuQQiu](https://redirect.github.com/LuQQiu) in
[#​6503](https://redirect.github.com/lance-format/lance/pull/6503)
- feat: add configurable blob v2 pack file size by
[@​hamersaw](https://redirect.github.com/hamersaw) in
[#​6508](https://redirect.github.com/lance-format/lance/pull/6508)
- feat: expose has\_stable\_row\_ids property on LanceDataset by
[@​pengw0048](https://redirect.github.com/pengw0048) in
[#​6531](https://redirect.github.com/lance-format/lance/pull/6531)
- feat: expose base scoped store bindings to python by
[@​zhangyue19921010](https://redirect.github.com/zhangyue19921010)
in
[#​6547](https://redirect.github.com/lance-format/lance/pull/6547)
- feat: support zonemap index segments by
[@​beinan](https://redirect.github.com/beinan) in
[#​6593](https://redirect.github.com/lance-format/lance/pull/6593)
- feat: update lance-namespace to 0.7.2 and align namespace declared
table lifecycle by
[@​jackye1995](https://redirect.github.com/jackye1995) in
[#​6608](https://redirect.github.com/lance-format/lance/pull/6608)
- feat: generalize dynamic object store credentials by
[@​jackye1995](https://redirect.github.com/jackye1995) in
[#​6609](https://redirect.github.com/lance-format/lance/pull/6609)
- feat: add ANNIvfPartitionExecProto by
[@​LuQQiu](https://redirect.github.com/LuQQiu) in
[#​6612](https://redirect.github.com/lance-format/lance/pull/6612)
- feat: add prefilter\_type to ANNIvfSubIndexExecProto by
[@​LuQQiu](https://redirect.github.com/LuQQiu) in
[#​6613](https://redirect.github.com/lance-format/lance/pull/6613)
- feat: replace Azure SDK and google-cloud-auth with direct reqwest for
credential vending by
[@​jackye1995](https://redirect.github.com/jackye1995) in
[#​6617](https://redirect.github.com/lance-format/lance/pull/6617)
- feat(io): bypass backpressure for io\_buffer\_size=0 and 2.0 indirect
I/O by [@​westonpace](https://redirect.github.com/westonpace) in
[#​6627](https://redirect.github.com/lance-format/lance/pull/6627)
##### Bug Fixes 🐛
- fix: warn and clamp LANCE\_INITIAL\_UPLOAD\_SIZE instead of panicking
by [@​LuciferYang](https://redirect.github.com/LuciferYang) in
[#​6389](https://redirect.github.com/lance-format/lance/pull/6389)
- fix: keep delete-by-source fast path with scalar indexes by
[@​Xuanwo](https://redirect.github.com/Xuanwo) in
[#​6435](https://redirect.github.com/lance-format/lance/pull/6435)
- fix: include column\_metadatas and column\_infos in
CachedFileMetadata::DeepSizeOf by
[@​jiaoew1991](https://redirect.github.com/jiaoew1991) in
[#​6480](https://redirect.github.com/lance-format/lance/pull/6480)
- fix(index): preserve fts prewarm position codec by
[@​BubbleCal](https://redirect.github.com/BubbleCal) in
[#​6485](https://redirect.github.com/lance-format/lance/pull/6485)
- fix: handle FlatBin quantization in optimize\_vector\_indices\_v2 by
[@​jackye1995](https://redirect.github.com/jackye1995) in
[#​6488](https://redirect.github.com/lance-format/lance/pull/6488)
- fix: use logical OR instead of bitwise OR in conflict resolver by
[@​dentiny](https://redirect.github.com/dentiny) in
[#​6492](https://redirect.github.com/lance-format/lance/pull/6492)
- fix: add dir\_listing\_to\_manifest\_migration\_enabled flag to avoid
extra object store calls by
[@​jackye1995](https://redirect.github.com/jackye1995) in
[#​6507](https://redirect.github.com/lance-format/lance/pull/6507)
- fix: prevent arithmetic overflow in U64Segment encoding selection for
sparse/extreme row id ranges by
[@​ivscheianu](https://redirect.github.com/ivscheianu) in
[#​6516](https://redirect.github.com/lance-format/lance/pull/6516)
- fix: bump jieba-rs to 0.9.0 to fix build-no-lock CI by
[@​westonpace](https://redirect.github.com/westonpace) in
[#​6518](https://redirect.github.com/lance-format/lance/pull/6518)
- fix: blob projection schema compatibility by
[@​Xuanwo](https://redirect.github.com/Xuanwo) in
[#​6521](https://redirect.github.com/lance-format/lance/pull/6521)
- fix(namespace): serialize manifest mutations by
[@​Xuanwo](https://redirect.github.com/Xuanwo) in
[#​6525](https://redirect.github.com/lance-format/lance/pull/6525)
- fix: missing bumpversion entry for lance-tokenizer by
[@​Xuanwo](https://redirect.github.com/Xuanwo) in
[#​6526](https://redirect.github.com/lance-format/lance/pull/6526)
- fix: scale default memory pool size by partition count by
[@​westonpace](https://redirect.github.com/westonpace) in
[#​6562](https://redirect.github.com/lance-format/lance/pull/6562)
- fix: apply fragment bitmap allow-list to index search results by
[@​westonpace](https://redirect.github.com/westonpace) in
[#​6563](https://redirect.github.com/lance-format/lance/pull/6563)
- fix: hard cap batch size in merge\_insert to prevent sort failures by
[@​westonpace](https://redirect.github.com/westonpace) in
[#​6564](https://redirect.github.com/lance-format/lance/pull/6564)
- fix: index type try\_from miss RTree and BLOOMFILTER by
[@​wojiaodoubao](https://redirect.github.com/wojiaodoubao) in
[#​6568](https://redirect.github.com/lance-format/lance/pull/6568)
- fix(namespace): align error handling with namespace spec by
[@​jackye1995](https://redirect.github.com/jackye1995) in
[#​6575](https://redirect.github.com/lance-format/lance/pull/6575)
- fix: reject Rewrite vs CreateIndex when FRI groups straddle bitmap by
[@​wjones127](https://redirect.github.com/wjones127) in
[#​6610](https://redirect.github.com/lance-format/lance/pull/6610)
- fix(json): detect float64-stored numbers in json type extraction by
[@​dentiny](https://redirect.github.com/dentiny) in
[#​6622](https://redirect.github.com/lance-format/lance/pull/6622)
- fix: respect LANCE\_DEFAULT\_IO\_BUFFER\_SIZE if it has been set by
[@​westonpace](https://redirect.github.com/westonpace) in
[#​6636](https://redirect.github.com/lance-format/lance/pull/6636)
- fix: make vector distance schema nullable by
[@​BubbleCal](https://redirect.github.com/BubbleCal) in
[#​6649](https://redirect.github.com/lance-format/lance/pull/6649)
##### Documentation 📚
- docs: tighten python environment workflow guidance by
[@​Xuanwo](https://redirect.github.com/Xuanwo) in
[#​6520](https://redirect.github.com/lance-format/lance/pull/6520)
- docs: fix broken intra-doc link in DatasetPreFilter by
[@​LuciferYang](https://redirect.github.com/LuciferYang) in
[#​6579](https://redirect.github.com/lance-format/lance/pull/6579)
- docs: correct repetition level example in encoding docs by
[@​BubbleCal](https://redirect.github.com/BubbleCal) in
[#​6585](https://redirect.github.com/lance-format/lance/pull/6585)
##### Performance Improvements 🚀
- perf: intern DataFile fields/column\_indices to reduce manifest memory
by [@​beinan](https://redirect.github.com/beinan) in
[#​6477](https://redirect.github.com/lance-format/lance/pull/6477)
- perf: intern RowDatasetVersionMeta inline bytes to reduce manifest
memory by [@​beinan](https://redirect.github.com/beinan) in
[#​6499](https://redirect.github.com/lance-format/lance/pull/6499)
- perf: add SIMD-accelerated u8 dot product for SQ distance by
[@​justinrmiller](https://redirect.github.com/justinrmiller) in
[#​6506](https://redirect.github.com/lance-format/lance/pull/6506)
- perf: add SIMD kernels for bf16 distance functions by
[@​justinrmiller](https://redirect.github.com/justinrmiller) in
[#​6510](https://redirect.github.com/lance-format/lance/pull/6510)
- perf: submit I/O requests eagerly in FullZipScheduler by
[@​hushengquan](https://redirect.github.com/hushengquan) in
[#​6513](https://redirect.github.com/lance-format/lance/pull/6513)
- perf: add SIMD-accelerated u8 L2 and cosine distance kernels by
[@​justinrmiller](https://redirect.github.com/justinrmiller) in
[#​6517](https://redirect.github.com/lance-format/lance/pull/6517)
- perf: speed up RaBitQ 4-bit LUT distance on ARM by 16x by
[@​justinrmiller](https://redirect.github.com/justinrmiller) in
[#​6537](https://redirect.github.com/lance-format/lance/pull/6537)
- perf: add explicit SIMD types and distance kernels for f64 by
[@​justinrmiller](https://redirect.github.com/justinrmiller) in
[#​6540](https://redirect.github.com/lance-format/lance/pull/6540)
- perf: don't spawn the scheduling on a separate thread for small reads
by [@​westonpace](https://redirect.github.com/westonpace) in
[#​6637](https://redirect.github.com/lance-format/lance/pull/6637)
- perf: avoid materializing RoaringBitmap::full() in fragment allow-list
by [@​wkalt](https://redirect.github.com/wkalt) in
[#​6664](https://redirect.github.com/lance-format/lance/pull/6664)
- perf: revert inline scheduling by
[@​westonpace](https://redirect.github.com/westonpace) in
[#​6709](https://redirect.github.com/lance-format/lance/pull/6709)
**Full Changelog**:
<lance-format/lance@release-root/6.0.0-beta.N...v6.0.0>
</details>
---
### Configuration
📅 **Schedule**: (UTC)
- Branch creation
- Between 12:00 AM and 03:59 AM, only on Monday (`* 0-3 * * 1`)
- Automerge
- At any time (no schedule defined)
🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.
♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.
🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.
---
- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box
---
This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/vortex-data/vortex).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xNzkuMyIsInVwZGF0ZWRJblZlciI6IjQzLjE4Mi4yIiwidGFyZ2V0QnJhbmNoIjoiZGV2ZWxvcCIsImxhYmVscyI6WyJjaGFuZ2Vsb2cvY2hvcmUiXX0=-->
---------
Signed-off-by: Robert Kruszewski <github@robertk.io>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Robert Kruszewski <github@robertk.io>1 parent c573cef commit 97f21d0
2 files changed
Lines changed: 660 additions & 1701 deletions
0 commit comments