Skip to content

Commit 998ca18

Browse files
beinanBeinan Wang
andauthored
Add configurable V1 blob encoding for large payload columns (#49)
## Summary - Add `blob_columns: HashSet<String>` to `ContextStoreOptions` to let users opt columns into Lance V1 blob encoding - Valid columns: `text_payload` and `binary_payload` — blob-encoded columns store data in out-of-line buffers for efficient storage of large/unpredictable content - `batch_to_records()` auto-detects column types from the batch schema, preserving backward compatibility with existing non-blob datasets - Python bindings expose `blob_columns` parameter on `Context.create()` ## Usage ```rust // Rust let options = ContextStoreOptions { blob_columns: HashSet::from(["binary_payload".into(), "text_payload".into()]), ..Default::default() }; let store = ContextStore::open_with_options(uri, options).await?; ``` ```python # Python ctx = Context.create("./store", blob_columns=["binary_payload", "text_payload"]) ``` ## Test plan - [x] `test_blob_binary_payload` — roundtrip with blob-encoded binary_payload - [x] `test_blob_text_payload` — roundtrip with blob-encoded text_payload (LargeUtf8 → LargeBinary) - [x] `test_blob_both_columns` — both columns blob-encoded simultaneously - [x] `test_no_blob_default` — default (no blob) schema unchanged - [x] `test_blob_schema_metadata` — verifies `lance-encoding:blob` metadata on fields - [x] `test_blob_invalid_column_name` — rejects unknown column names - [x] `test_batch_to_records_autodetects_text_type` — auto-detection works for both LargeUtf8 and LargeBinary batches - [x] All 15 tests pass (`cargo test -p lance-context-core`) - [x] Python bindings compile (`cargo check -p lance-context-python`) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Beinan Wang <beinanwang@microsoft.com>
1 parent c12ef77 commit 998ca18

2 files changed

Lines changed: 327 additions & 20 deletions

File tree

0 commit comments

Comments
 (0)