Implement support for duckdb_arrow_scan for ingesting data via Arrow#488
Implement support for duckdb_arrow_scan for ingesting data via Arrow#488phillipleblanc wants to merge 1 commit into
duckdb_arrow_scan for ingesting data via Arrow#488Conversation
#18) * Implement support for `duckdb_arrow_scan` for ingesting data via Arrow * Map the NulError
|
Hello! Thanks a lot for the PR! Im about to cut v1.2.2 so in order to expedite getting this merged I forked this off into #489 with formatting/clippy fixes |
|
Actually, ASAN reports memory leaks related to the test. I've done some digging and asked around the office from people with more arrow-knowledge, and my impression is that the I guess these functions are deprecated for a reason. I don't know exactly when we will get around to replace them with something better, but we're looking into trying to fix what we think is causing this issue in particular - and if we do I'll check against this PR again. |
Thanks for checking! I'll convert this PR to draft until the memory leak issue is fixed and/or a new C API is available that we can use. |
|
duckdb/duckdb#18246 lays the groundwork for a new Arrow C API. |
|
Looks like the new C API will be coming out in DuckDB v1.4. I'll close this PR since we'll need a new one to integrate with the new C API. |
Adds support for calling the duckdb_arrow_scan when passed an FFI_ArrowArrayStream from the Arrow library that implements the https://arrow.apache.org/docs/format/CStreamInterface.html
This matches behavior that exists in the Go DuckDB library: https://github.com/marcboeker/go-duckdb/blob/c607539e645091c1e3f600f929229d8baa7166e4/arrow.go#L251
This also provides an alternative to the
arrow-vtabmodule that is not zero-copy in all cases for Arrow -> DuckDB buffers, whereas the native C++ DuckDB code has better support for zero-copy Arrow -> DuckDB buffers.