Commit 07dbdb4
authored
feat: add lance_dataset_write for create/append/overwrite from ArrowArrayStream (#16)
## Summary
- Adds `lance_dataset_write(uri, schema, stream, mode, storage_opts,
out_dataset)` — writes an `ArrowArrayStream` into a Lance dataset with a
committed manifest
- `LanceWriteMode` covers `CREATE` / `APPEND` / `OVERWRITE`
- Optional `out_dataset` hands back an open `LanceDataset*` at the new
version so callers don't need to reopen
- Matching `lance::Dataset::write(...)` static method in `lance.hpp`
with full RAII (`StreamGuard` + `SchemaGuard`)
## Motivation
Until now the C/C++ path only produced uncommitted fragment files (#5).
`lance_dataset_write` closes the primary write path and unblocks the
rest of Phase 3 (delete, update, merge-insert, schema evolution), which
all need a way to create a dataset first.
## FFI contract notes
- `mode` parameter is `int32_t` (not `LanceWriteMode`) on the wire —
defends against `-fshort-enums` ABI mismatch. Validated in Rust via
`LanceWriteMode::from_raw` before any unsafe enum construction.
- Stream is consumed via `ArrowArrayStreamReader::from_raw` **before**
uri/schema NULL checks, so the "consumed on every return path" contract
holds for every error branch — verified by
`test_dataset_write_releases_stream_on_every_error_path` (drop-counter
on the boxed reader).
- Schema is read by shared reference; the function does NOT call
`schema->release`. Caller (or C++ `SchemaGuard`) retains ownership.
Documented in the header.
- `*out_dataset` is written only on success; error paths leave it
untouched. Verified by sentinel-pointer test.
- C++ wrapper builds `SchemaGuard` BEFORE `get_schema` so a
non-conforming producer that partially populates the schema before
reporting failure still has its `release` fired on unwind. `StreamGuard`
covers `std::bad_alloc` during `kv` construction; `disarm()`s right
before the C call.
## Test plan
- `cargo test` — 87 integration tests, 13 covering the writer
(CREATE/APPEND/OVERWRITE happy paths, OVERWRITE on a missing path,
CREATE on an existing path, declared/append schema mismatches, empty
stream, NULL args, invalid mode, `out_dataset` propagation,
stream-release-on-every-error-path, out_dataset-untouched-on-error)
- `cargo clippy --all-targets -- -D warnings` clean
- `cargo fmt --check` clean
- `cargo test --test compile_and_run_test -- --ignored` — C and C++
scan→write round-trips pass
Closes #14.1 parent a663d53 commit 07dbdb4
8 files changed
Lines changed: 1219 additions & 12 deletions
File tree
- include
- src
- tests
- cpp
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
481 | 481 | | |
482 | 482 | | |
483 | 483 | | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
484 | 540 | | |
485 | 541 | | |
486 | 542 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
| |||
94 | 95 | | |
95 | 96 | | |
96 | 97 | | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
97 | 106 | | |
98 | 107 | | |
99 | 108 | | |
| |||
122 | 131 | | |
123 | 132 | | |
124 | 133 | | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
125 | 246 | | |
126 | 247 | | |
127 | 248 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
| |||
36 | 37 | | |
37 | 38 | | |
38 | 39 | | |
| 40 | + | |
0 commit comments