Commit 998ca18
Add configurable V1 blob encoding for large payload columns (#49)
## Summary
- Add `blob_columns: HashSet<String>` to `ContextStoreOptions` to let
users opt columns into Lance V1 blob encoding
- Valid columns: `text_payload` and `binary_payload` — blob-encoded
columns store data in out-of-line buffers for efficient storage of
large/unpredictable content
- `batch_to_records()` auto-detects column types from the batch schema,
preserving backward compatibility with existing non-blob datasets
- Python bindings expose `blob_columns` parameter on `Context.create()`
## Usage
```rust
// Rust
let options = ContextStoreOptions {
blob_columns: HashSet::from(["binary_payload".into(), "text_payload".into()]),
..Default::default()
};
let store = ContextStore::open_with_options(uri, options).await?;
```
```python
# Python
ctx = Context.create("./store", blob_columns=["binary_payload", "text_payload"])
```
## Test plan
- [x] `test_blob_binary_payload` — roundtrip with blob-encoded
binary_payload
- [x] `test_blob_text_payload` — roundtrip with blob-encoded
text_payload (LargeUtf8 → LargeBinary)
- [x] `test_blob_both_columns` — both columns blob-encoded
simultaneously
- [x] `test_no_blob_default` — default (no blob) schema unchanged
- [x] `test_blob_schema_metadata` — verifies `lance-encoding:blob`
metadata on fields
- [x] `test_blob_invalid_column_name` — rejects unknown column names
- [x] `test_batch_to_records_autodetects_text_type` — auto-detection
works for both LargeUtf8 and LargeBinary batches
- [x] All 15 tests pass (`cargo test -p lance-context-core`)
- [x] Python bindings compile (`cargo check -p lance-context-python`)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Beinan Wang <beinanwang@microsoft.com>1 parent c12ef77 commit 998ca18
2 files changed
Lines changed: 327 additions & 20 deletions
0 commit comments