Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

- **LanceDB user_id migration hardening**
Startup and migration logic now include cross-process file locking, legacy `-1` orphan marker remapping to reserved int64 sentinel values, zero-progress loop protection, and shared embeddings-table listing utilities to avoid API-compat drift.

- **Chunk multi-page metadata and document page stats (breaking)**
Chunking now preserves multi-page origins via `metadata["spanning_pages"]` (a sorted list of 1-based page numbers) and derives `page_number` from the first page when missing. Parse records also persist document-level page statistics (`page_count`, `page_numbers`) in `params_json`.
**Impact on existing data:** Previously generated chunks only stored a single `page_number` and dropped cross-page information. It is not possible to retroactively reconstruct full page coverage for those chunks without re-parsing and re-chunking the original documents. As a result, page-based views for legacy data may still undercount pages or show chunks only on their representative page until those documents are reprocessed.
Loading
Loading