-
Notifications
You must be signed in to change notification settings - Fork 194
Description
I'm interested in querying in-memory Arrow data using DuckDB.
I first looked into the nanoarrow community extension, but it reads from the IPC format via files, not in-memory Arrow data.
Then I looked into the VTabArrow feature of duckdb-rs. One thing I find lacking in the API is that it takes a single RecordBatch and not a Vec<RecordBatch>. I have to concat all my batches into a single one, which might be relatively expensive.
The issue I want to report here is the crashes I get when building a VIEW of the RecordBatch. Here's a repro snippet:
fn example_record_batch() -> arrow_array::RecordBatch {
arrow_array::record_batch!(
("id", Int64, [1, 2, 3, 4]),
("name", Utf8, ["apple", "banana", "cherry", "date"]),
("is_odd", Boolean, [true, false, true, false])
).unwrap()
}
fn main() {
let batch = example_record_batch();
let quack = duckdb::Connection::open_in_memory().unwrap();
quack.register_table_function::<duckdb::vtab::arrow::ArrowVTab>("arrow").unwrap();
// 1. Directly reading from arrow().
let read1 = quack
.prepare("SELECT * FROM arrow(?, ?)")
.unwrap()
.query_arrow(duckdb::vtab::arrow_recordbatch_to_query_params(batch.clone()))
.unwrap()
.collect::<Vec<_>>();
assert_eq!(vec![batch.clone()], read1);
// 2. Creating a table (= copying the data).
quack.execute(
"CREATE TABLE test1 AS SELECT * FROM arrow(?, ?)",
duckdb::vtab::arrow_recordbatch_to_query_params(batch.clone()),
).unwrap();
let read2 = quack
.prepare("SELECT * FROM test1")
.unwrap()
.query_arrow([])
.unwrap()
.collect::<Vec<_>>();
assert_eq!(vec![batch.clone()], read2);
// 3. Creating a view.
let [array, schema] = duckdb::vtab::arrow_recordbatch_to_query_params(batch.clone());
quack.execute(
&format!("CREATE VIEW test2 AS SELECT * FROM arrow({}::UBIGINT, {}::UBIGINT)", array, schema),
[],
).unwrap();
println!("Crash incoming!");
let read3 = quack
.prepare("SELECT * FROM test2")
.unwrap()
.query_arrow([])
.unwrap()
.collect::<Vec<_>>();
assert_eq!(vec![batch.clone()], read3);
}(1) & (2) are fine. But with (3) I get a segfault, with 3 possible messages:
- Plain segfault (core dumped)
free(): double free detected in tcache 2then crashthread 'main' (109399) panicked at /home/bru/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-schema-56.2.0/src/ffi.rs:269:14: The external API has a non-utf8 as format: Utf8Error { valid_up_to: 1, error_len: Some(1) }.
I don't know if views are supposed to be supported with VTab in general, much less with ArrowVTab. Rejecting the create view query would be nice (anything's better than segfaulting really!).