Skip to content

Segfault on CREATE VIEW using ArrowVTab #648

@brunal

Description

@brunal

I'm interested in querying in-memory Arrow data using DuckDB.

I first looked into the nanoarrow community extension, but it reads from the IPC format via files, not in-memory Arrow data.

Then I looked into the VTabArrow feature of duckdb-rs. One thing I find lacking in the API is that it takes a single RecordBatch and not a Vec<RecordBatch>. I have to concat all my batches into a single one, which might be relatively expensive.

The issue I want to report here is the crashes I get when building a VIEW of the RecordBatch. Here's a repro snippet:

fn example_record_batch() -> arrow_array::RecordBatch {
    arrow_array::record_batch!(
        ("id", Int64, [1, 2, 3, 4]),
        ("name", Utf8, ["apple", "banana", "cherry", "date"]),
        ("is_odd", Boolean, [true, false, true, false])
    ).unwrap()
}

fn main() {
    let batch = example_record_batch();
    let quack = duckdb::Connection::open_in_memory().unwrap();
    quack.register_table_function::<duckdb::vtab::arrow::ArrowVTab>("arrow").unwrap();

    // 1. Directly reading from arrow().
    let read1 = quack
        .prepare("SELECT * FROM arrow(?, ?)")
        .unwrap()
        .query_arrow(duckdb::vtab::arrow_recordbatch_to_query_params(batch.clone()))
        .unwrap()
        .collect::<Vec<_>>();
    assert_eq!(vec![batch.clone()], read1);

    // 2. Creating a table (= copying the data).
    quack.execute(
        "CREATE TABLE test1 AS SELECT * FROM arrow(?, ?)",
        duckdb::vtab::arrow_recordbatch_to_query_params(batch.clone()),
    ).unwrap();

    let read2 = quack
        .prepare("SELECT * FROM test1")
        .unwrap()
        .query_arrow([])
        .unwrap()
        .collect::<Vec<_>>();
    assert_eq!(vec![batch.clone()], read2);

    // 3. Creating a view.
    let [array, schema] = duckdb::vtab::arrow_recordbatch_to_query_params(batch.clone());
    quack.execute(
        &format!("CREATE VIEW test2 AS SELECT * FROM arrow({}::UBIGINT, {}::UBIGINT)", array, schema),
        [],
    ).unwrap();

    println!("Crash incoming!");

    let read3 = quack
        .prepare("SELECT * FROM test2")
        .unwrap()
        .query_arrow([])
        .unwrap()
        .collect::<Vec<_>>();
    assert_eq!(vec![batch.clone()], read3);
}

(1) & (2) are fine. But with (3) I get a segfault, with 3 possible messages:

  • Plain segfault (core dumped)
  • free(): double free detected in tcache 2 then crash
  • thread 'main' (109399) panicked at /home/bru/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-schema-56.2.0/src/ffi.rs:269:14: The external API has a non-utf8 as format: Utf8Error { valid_up_to: 1, error_len: Some(1) }.

I don't know if views are supposed to be supported with VTab in general, much less with ArrowVTab. Rejecting the create view query would be nice (anything's better than segfaulting really!).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions